Questions tagged [pca]

Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.

Principal component analysis is a technique to decompose an array of numerical data into a set of orthogonal vectors (uncorrelated linear combinations of the variables) called principal components. The first few principal components often suffice to grasp nearly all the multivariate variability of the data; therefore PCA is one of the data reduction / dimensionality reduction methods.

3419 questions
108
votes
5 answers

Loadings vs eigenvectors in PCA: when to use one or another?

In principal component analysis (PCA), we get eigenvectors (unit vectors) and eigenvalues. Now, let us define loadings as $$\text{Loadings} = \text{Eigenvectors} \cdot \sqrt{\text{Eigenvalues}}.$$ I know that eigenvectors are just directions and…
user2696565
  • 1,389
64
votes
3 answers

What is the objective function of PCA?

Principal component analysis can use matrix decomposition, but that is just a tool to get there. How would you find the principal components without the use of matrix algebra? What is the objective function (goal), and what are the constraints?
40
votes
5 answers

Examples of PCA where PCs with low variance are "useful"

Normally in principal component analysis (PCA) the first few PCs are used and the low variance PCs are dropped, as they do not explain much of the variation in the data. However, are there examples where the low variation PCs are useful (i.e. have…
Michael
  • 403
27
votes
2 answers

Reversing PCA back to the original variables

I have a set of data that has $n$ samples described by $m$ variables. I do a PCA to reduce it to just 2 dimensions so I can make a nice 2D plot of the data. I understand that the $x,y$ coordinates (i.e., the PCA scores) for the plot are calculated…
27
votes
1 answer

Not normalizing data before PCA gives better explained variance ratio

I normalized my dataset then ran 3 component PCA to get small explained variance ratios ([0.50, 0.1, 0.05]). When I didn't normalize but whitened my dataset then ran 3 component PCA, I got high explained variance ratios ([0.86, 0.06,0.01]). Since I…
user46925
21
votes
2 answers

Low variance components in PCA, are they really just noise? Is there any way to test for it?

I'm trying to decide if a component of a PCA shall be retained, or not. There are a gazillion of criteria based on the magnitude of the eigenvalue, described and compared e.g. here or here. However, in my application I know that the small(est)…
Daniel
  • 211
20
votes
1 answer

What is the difference between regular PCA and probabilistic PCA?

I know regular PCA does not follow probabilistic model for observed data. So what is the basic difference between PCA and PPCA? In PPCA latent variable model contains for example observed variables $y$, latent (unobserved variables $x$) and a…
Vendetta
  • 595
20
votes
2 answers

Why are principal components in PCA (eigenvectors of the covariance matrix) mutually orthogonal?

Why are principal components in PCA mutually orthogonal? I know that PCA can be calculated by eig(cov(X)), where X is centered. But I do not see why the eigenvectors should be orthogonal.
19
votes
2 answers

How to interpret PCA loadings?

While reading about PCA, I came across the following explanation: Suppose we have a data set where each data point represents a single student's scores on a math test, a physics test, a reading comprehension test, and a vocabulary test. We find…
priyanka
  • 325
16
votes
3 answers

Interpreting PCA scores

Can anyone help me in interpreting PCA scores? My data come from a questionnaire on attitudes toward bears. According to the loadings, I have interpreted one of my principal components as "fear of bears". Would the scores of that principal component…
Agnese Marino
16
votes
3 answers

Are PCA solutions unique?

When I run PCA on a certain data set, is the solution given to me unique? I.e., I obtain a set of 2d coordinates, based on interpoint distances. Is it possible to find at least one more arrangement of the points that would meet these…
raygozag
  • 603
15
votes
1 answer

Is PCA still done via the eigendecomposition of the covariance matrix when dimensionality is larger than the number of observations?

I have a $20\times100$ matrix $X$, containing my $N=20$ samples in the $D=100$-dimensional space. I now wish to code up my own principal component analysis (PCA) in Matlab. I demean $X$ to $X_0$ first. I read from someone's code that in such…
14
votes
3 answers

How can I interpret what I get out of PCA?

As part of a University assignment, I have to conduct data pre-processing on a fairly huge, multivariate (>10) raw data set. I'm not a statistician in any sense of the word, so I'm a little confused as to what's going on. Apologies in advance for…
nitsua
  • 243
13
votes
3 answers

Does PCA preserve linear separability for every linearly separable set?

After using PCA to reduce dimensionality, does PCA preserve linear separability for any linearly separable set? Will the data still be linearly separable after the transformation? I am thinking that it does preserve linear separability because the…
LVST
  • 133
11
votes
4 answers

Is it acceptable to reverse a sign of a principal component score?

I have two datasets from similar psycholinguistic experiments. In both of them, information about the participant's reading and spelling ability was collected, then converted into standardized scores zRead and zSpell. The aim is to use these as…
Marius
  • 506
  • 3
  • 14
1
2 3
16 17