Before we apply PCA, we often normalize our data so that each feature has mean 0 and variance 1. We do this by computing the mean and standard deviation of each feature and then for each in our data, we subtract the mean and divide by the standard deviation.
To select the principal component of , we need to find the unit vector that maximizes the variance of the data when projected onto . The greater the projection of onto , the higher the variance, meaning more information is captured in that direction.

PCA seeks the direction whose projected points (red) are most spread out — high variance (left) rather than low (right).

The projection of onto , written , is the foot of the perpendicular dropped from onto the line spanned by .
We can now use Lagrange optimization to find the unit vector that maximizes the variance. And it turns out the variance is maximized when is the eigenvector of our symmetric matrix .
However, we don't know which to choose if there are multiple that satisfy this equation.
But we can show we get the maximum variance when we choose the largest eigenvalue .
Therefore, to maximize the variance, we need to choose the eigenvector with the largest eigenvalue.

The principal components are the eigenvectors of : lies along the direction of greatest variance (largest eigenvalue), with orthogonal to it.
In practice, we decompose into its eigenvalues and eigenvectors using singular value decomposition and then choose the top eigenvectors with the largest eigenvalues.
This content is best viewed on a laptop or desktop device.