Before we apply PCA, we often normalize our data so that each feature has mean 0 and variance 1. We do this by computing the mean and standard deviation of each feature and then for each in our data, we subtract the mean and divide by the standard deviation.
To select the principal component of , we need to find the unit vector that maximizes the variance of the data when projected onto . The greater the projection of onto , the higher the variance, meaning more information is captured in that direction.
We can now use Lagrange optimization to find the unit vector that maximizes the variance. And it turns out the variance is maximized when is the eigenvector of our symmetric matrix .
However, we don't know which to choose if there are multiple that satisfy this equation.
But we can show we get the maximum variance when we choose the largest eigenvalue .
Therefore, to maximize the variance, we need to choose the eigenvector with the largest eigenvalue.
In practice, we decompose into its eigenvalues and eigenvectors using singular value decomposition and then choose the top eigenvectors with the largest eigenvalues.
This content is best viewed on a laptop or desktop device.