0

I am new to principal component analysis (PCA). I performed PCA for a dataset with 54 samples. When I project them in 3D scatterplot, I can see samples with similar characteristics are grouped together separately. The X,Y and Z axes in 3d scatterplot represent PC#1, PC#2 and PC#3 respectively. Along the axes positive and negative values are represented.

What does these values convey, especially negative values ?

If a sample is found along an axis with negative value, what does that imply?

Also the overall variance all 3 PCs is 40% (PC#1-20%,PC#2-13% and PC#3=7%). What does that imply? Why it is not 80-90%? Is my data of good quality?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Dinesh
  • 117
  • 2
    You may find that the insight afforded by the answers to Making sense of PCA... will help you. Some of the descriptions and images, especially in the highest-voted answers, really do answer your first two questions. As far as the third goes, there is no connection at all between data quality and variance "explained" by principal components, but there are things that can be said about your data based on the statistics you report. – whuber Aug 13 '13 at 22:05
  • @whuber So 40% of variance explains that there is only 40% of variance in my data? and the remaining 60% is similar?Please guide me. – Dinesh Aug 13 '13 at 22:13
  • Oh, no, nothing like that. It means that the reduction of dimensionality (reducing from a matrix of 54 samples described by p variables to a matriz describing 54 samples by only three axes [3 < p]) has been made by "ignoring" 60% of the original variability in the matrix (=you are only keeping 40% of the multivariate description, let's say). It has nothing to do with data quality. It just says that the original variables were not correlated that much, so they can't be simplified in a few dimensions without throwing away 60% of the baby with the bathwater. – FairMiles Aug 21 '13 at 22:41

0 Answers0