5

I am interested in building a linear discriminant function to discriminate between 2 groups, out of 60 variables. (I'm planning to select the most discriminative of the variables for a future diagnostic test.) I have calculated the area under the ROC curve for each of these variables individually and none has an AUC greater than 0.73. I have a fairly small sample of 50 healthy and 50 diseased individuals (these are the two groups).

I have tried to reduce the number of variables using principal component analysis (PCA). There are 3 components accounting for 83% of the variation. But unfortunately, all 60 variables have similar weightings (loadings) in the 3 components, so I can't pick just few. I would ordinarily pick the highest weighted variables and then incorporate them in a linear discriminant function, but 60 is too many, especially given the small sample.

I wondered if, rather than use the 60 variables, it is possible to use the 3 principal components themselves in a linear discriminant analysis (LDA)?

amoeba
  • 104,745
Andrew
  • 51
  • Welcome, Andy. 2) Please check if I edited the question right. 3) Did you try do rotate (e.g. varimax) the loadings; that might help. 3) How do you hope to select... the variables for a future diagnostic test since you've replaced the variables by the components? 4) Why do you think you need PCA at all? Discriminant analysis itself is a tool to reduce variables, and is better and more reliable than PCA in the task to separate groups.
  • – ttnphns Jun 27 '13 at 17:08
  • The PCA gives a coordinate system into which you can map your data. If you were to choose only one component, then you could describe each of your samples using only a single scalar value. If you used only 2 components, then you would map your 60 dimensional space to a 2-dimensional space.

    In these revised spaces sometimes the information comes out more clearly.

    – EngrStudent Jun 27 '13 at 18:04