Why do we scale features in PCA? Wouldn't that mean the variance in all dimensions is just $1$?

Question

According to https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html,

Feature scaling through standardization (or Z-score normalization) can be an important preprocessing step for many machine learning algorithms. Standardization involves rescaling the features such that they have the properties of a standard normal distribution with a mean of zero and a standard deviation of one.

However, I thought PCA wanted to capture the principal components with the most variance. If we are rescaling the features so they all have standard deviation and variance 1, then wouldn't there be no point in running PCA - everyone has variance 1?

In particular, I'm asking not why we need or want standardization, but why standardization doesn't fail (so it is not a duplicate of the linked question).

I don’t agree with the comment that the distribution is normal. Subtracting the mean and the. dividing by the standard will retain features like skewness and and multimodality. — Dave, Sep 23 '22 at 21:56
If "variance" solely meant variances of the individual variables, standardization would indeed be problematic. But that's not the meaning of "most variance" in PCA. Look at any diagonally oriented ellipse in two dimensions to see the distinction: its main axis (where it has most variance) corresponds to neither coordinate; and even when the coordinates are standardized, the ellipse exhibits two distinct directions of maximal and minimal variance. — whuber, Sep 23 '22 at 22:04
I don't think this should be closed, as the linked thread does not address the specific question asked here. — Christian Hennig, Sep 23 '22 at 22:05
@whuber Perhaps I'm not fully understanding, but if the ellipse was not diagonally oriented, then the main axis would correspond to one coordinate, and the minor axis would correspond to another (the eigenvectors would be the standard basis), and so "maximal variance" would be the variance of one of the individual variables. So would standardization not be used here? — z611, Sep 23 '22 at 22:09
That's not correct: the axes of the ellipse correspond to the coordinates only when the ellipse is oriented along those coordinates. — whuber, Sep 23 '22 at 23:36

Why do we scale features in PCA? Wouldn't that mean the variance in all dimensions is just $1$?

0 Answers0