6

I don't have much background on statistics. I am working on multivariate morphometrics of a sample of frogs. I have a data matrix of 19 variables (continuous characteristics) for around 250 samples. The samples fall into 4 different groups (morphotypes).

Is PCA the best way to see if the samples belonging to one group cluster together? How can one identify the variables that would contribute the most to separation of the clusters? What other methods could help in a study like this?

MånsT
  • 11,979
anurag
  • 61
  • 1
    PCA can definitely be useful for graphical examination of your data set. By drawing scatter plots of the first few principal components, you may be able to see if samples from the same group cluster together. Regarding finding a good separatation of the clusters, this is the problem that discriminant analysis seeks to solve (see for instance this Wikipedia page) – MånsT Aug 08 '12 at 08:59
  • This has become a hot topic since we are not sure wether or not you a priori know the classification of your sample into the 4 groups. Looking forward for your explanation ;) – JDav Aug 08 '12 at 14:03
  • Yes,I know the classification of the samples into 4 groups. :) – anurag Aug 09 '12 at 12:55
  • See the similar post: https://stats.stackexchange.com/questions/190156/t-tests-manova-or-logistic-regression-how-to-compare-two-groups – kjetil b halvorsen Nov 17 '22 at 16:57

2 Answers2

3

The first question is whether you already know which frog belongs to which morphotype If you do know, and your goal is to use these frogs to better analyze how the morphotypes vary on these variables, then you want discriminant analysis. This might enable later investigators to accurately place frogs into morphotypes based on these variables.

If you do not know which frog belongs to which morphotype, then cluster analysis may be useful.

Both these methods have a lot of options and subtypes.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Yes I know which frog belongs to which morphotype. Will discriminant analysis help to find more characteristics to further discriminate between the samples ? – anurag Aug 09 '12 at 18:17
  • 1
    Well, you can already perfectly discriminate, since you know the morphotypes. Disriminant analysis would answer the question "What linear combination of these variables best distinguishes the morphotypes?" – Peter Flom Aug 09 '12 at 20:11
  • I read up about Canonical Variate Analysis. How is it different from PCA ? – anurag Aug 12 '12 at 12:59
1

I think that you know the group membership so as @PeterFlom said discriminant analysis is a good altternative. A similar method would be to estimate a Multinomial (logit or probit) model. In this model, you estimate the probability of clasyfing a frog into a given $k$ group depending on its characteristics $x$.

$P[G=k]=\Phi(\sum \beta_j^k x_j)$ where $\Phi$ is the probability distribution function you assume. The upper script on the beta parameters shows that each characteristic has a different impact on the possibility of classification at different groups.

The most simple version of this model is the multinomial logit and there are several extensions to it. I guess that's an affordable start if you are relatively new into statistics.

JDav
  • 771
  • 4
  • 8
  • The goal of discriminant analysis is classifying new unlabelled cases. While the OP knows the 4 types and may have the labels for a set of data it seems that the goal is to see if they seem to form 4 distinct clusters which is perhaps the reason PCA was mentioned. Perhaps a distance measure where you can look at the distance between cluster centers compared with the distance of members within a cluster to the cluster center would be the appropriate thing to look at. For multivariate normal data the Mahalanobis distance would be appropriate. – Michael R. Chernick Aug 08 '12 at 13:44