-2

Can anyone explain about the proportion of variance explained in PCA and why it is important in the analysis of PCA?

srimaster
  • 313

1 Answers1

1

For multivariate observations of potentially correlated data in say $n$ dimensions, the principal components provide orthogonal variables up to $n$.

The first principal component is in the direction of the largest spread or variance. Some of the variance in the $n$ components is the total variance. The proportion of variance explained by the first $r$ principal components provides the most variance for any r components. The percentage of variance explained by the first r principal components is just the total variance in the first r principal components divided by the total variance in all n principal components. This is important because a small number of principal components could explain a large portion of the total variance (say 80%) and so this can allow for dimensionality reduction to these $r$ principal components, which are linear combinations of the original variables.

This is also explained in a number of questions posed on this site including the one linked by David Kozak.

utobi
  • 11,726