3

I've just started reading about clustering and classification. It's a djungle, a fascinating one. Currently, however I have a rather urgent task, i.e to perform a sort of cluster analysis in the sense that I'd like to cluster my patients according to their phenotypes (biomarkers - continuous and categorical variables) and examine whether survival differs according to cluster. I'm not interested in any specific predictor, the purpose is merely to examine whether there are specific clusters of patients and whether the phenotypes associate with outcomes.

I'm looking for general advice on what type of method to use as well as recommended R package. I have 10 variables that are relevant for the phenotype. I could attach some data but I doubt it would contribute to the question, which is of more general character.

Thanks in advance.

Update: I'm looking for pros and cons of various techniques, with application to these kind of data. And I humbly understand that clustering may not be that straight forward.

1 Answers1

3

There is no universal clustering solution.

You need to try lots of different methods and spend a lot of time on preprocessing and visualizing your data. Sorry.

See also:

Estivill-Castro, V. (2002). Why so many clustering algorithms: a position paper. ACM SIGKDD explorations newsletter, 4(1), 65-75.

Clustering is an art, and cannot be automated in a meaningful way with decent results.

  • Thanks @Anony-Mousse. I'm not looking for a definitive answer. I understand that these things are very complex. I'm looking for guidance on suitable methods, as there are many more methods than presumably necessary for my purpose. Thanks for the reference, I'll read it with great interest! – Adam Robinsson Jan 17 '16 at 22:13