2

I was just wondering if it is possible to pool the data into groups before doing prcomp() in R? I have i.e. 100 individuals (rows) and 50 measurements(cols) with individuals being grouped in separate populations (20 in the first, 30 in the second, 15 in the third...etc). Now PCA can give scores of individuals or measurements on PC axes, but I need to see what PC score is associated with populations. On the graph I can only plot individual scores but I need just let`s say 5 or 6 populations and their corresponding score. Any ideas?

Fedja Blagojevic
  • 597
  • 4
  • 15

1 Answers1

3

It sounds like you really need to put them into groups after the principal components analysis? ie just for the plot? Something like:

thedata <- matrix(rnorm(5000),100,50)
pop.l <- c("A", "B", "C", "D", "E")
pops <- rep(pop.l,c(20,30,15,15,20))
x <-prcomp(thedata)
x2 <- predict(x)[,1:2]

plot(x2, type="n", bty="n")
for (i in 1:5){
    points(x2[pops==pop.l[i],], col=i)
}
legend(-4,4,legend=pop.l, pch=1, col=1:5, bty="n")

is a bit manual but would probably work?

Or (after the same prcomp call):

library(lattice)
xyplot(x2[,1] ~ x2[,2]|pops)
Peter Ellis
  • 17,650
  • (+1) I would suggest to use groups=pops (i.e., a panel.superpose) to clearly highlight overlap between subpopulation. Also, confidence ellipse might be displayed using panel.ellipse from the latticeExtra package, where ellipses might be computed as proposed in the FactoMineR package, for example. – chl Jan 20 '12 at 09:57