6

In the analyses of single-cell RNA-seq data there are different unsupervised approaches to identify putative subpopulations (e.g. as available with Suerat or SCDE packages).

Is there a good way of computationally validating the cluster solutions? Different methods may results in slightly different clustering results. How to know which one is the best i.e. representative of biological sub-populations?

Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
Deffiz
  • 153
  • 1
  • 2

2 Answers2

6

A SC3, single-cell consensus clustering, approach could be helpful here. It aims at achieving "high accuracy and robustness by combining multiple clustering solutions through a consensus approach" https://www.nature.com/nmeth/journal/v14/n5/full/nmeth.4236.html

olga
  • 481
  • 2
  • 8
3

While better methods of evaluating your clusters would be to use an external dataset or a dataset with known truth, there are a variety of internal validation metrics that can be used to compare clustering solutions without another dataset.

Here are a few metrics:

  • Davies-Bouldin Index
  • Calinski-Harabasz Index
  • Root-Mean-Square Standard Deviation

Many more can be found in this clustering review: http://stke.sciencemag.org/content/9/432/re6

These internal validation metrics grade your clustering solution based on three measures: compactness, connectedness, and separation. When using these metrics to compare clustering solutions, be sure to consider which metric is appropriate for your results as some algorithms work by optimizing certain measures.

Alec
  • 31
  • 3