Pre-PCA pooling in R

Question

I was just wondering if it is possible to pool the data into groups before doing prcomp() in R? I have i.e. 100 individuals (rows) and 50 measurements(cols) with individuals being grouped in separate populations (20 in the first, 30 in the second, 15 in the third...etc). Now PCA can give scores of individuals or measurements on PC axes, but I need to see what PC score is associated with populations. On the graph I can only plot individual scores but I need just let`s say 5 or 6 populations and their corresponding score. Any ideas?

Biplot? - example on this site, Visualizing a million, PCA edition — Andy W, Nov 15 '11 at 19:18
Wow, this ggbipolot is awesome...Thanks Andy, but unfortunately that was not the thing I was searching for. I need population scores based on individuals variation along PC axes...sort of a centroid score for different subgroups of individuals. — Fedja Blagojevic, Nov 16 '11 at 07:25

score 3 · Accepted Answer · answered Jan 20 '12 at 09:37

It sounds like you really need to put them into groups after the principal components analysis? ie just for the plot? Something like:

thedata <- matrix(rnorm(5000),100,50)
pop.l <- c("A", "B", "C", "D", "E")
pops <- rep(pop.l,c(20,30,15,15,20))
x <-prcomp(thedata)
x2 <- predict(x)[,1:2]

plot(x2, type="n", bty="n")
for (i in 1:5){
    points(x2[pops==pop.l[i],], col=i)
}
legend(-4,4,legend=pop.l, pch=1, col=1:5, bty="n")

is a bit manual but would probably work?

Or (after the same prcomp call):

library(lattice)
xyplot(x2[,1] ~ x2[,2]|pops)

(+1) I would suggest to use groups=pops (i.e., a panel.superpose) to clearly highlight overlap between subpopulation. Also, confidence ellipse might be displayed using panel.ellipse from the latticeExtra package, where ellipses might be computed as proposed in the FactoMineR package, for example. — chl, Jan 20 '12 at 09:57

Pre-PCA pooling in R

1 Answers1