Evaluate clusters of individuals by using their sequence data

Question

For a dataset of several hundred individuals, I applied a hierarchical clustering to generate clusters based on a functional trait that sets them apart.

My task is now to evaluate if these clusters can be supported by the nucleotide sequence data of the corresponding gene, i.e. is there more genetic similarity within each cluster than between the clusters.

For a first approach, I created a multiple sequence alignment for every cluster and calculated the the % identical sites value. There are more identical sites within each cluster than in an alignment of all sequences.

Do you know any good tools (preferably python) that can perform a more sophisticated evaluation of existing clusters based on sequence data? Are there other scoring methods with which I can evaluate the sequence similarity within the clusters (again preferably python)?

score 2 · Answer 1 · answered Jul 13 '18 at 19:46

You can calculate allele frequencies for each cluster you have to further verify if they belong to similar population, however if size of your dataset is rather small this may not work for you.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813878/, here is an article about differentiating populations based on sequencing data.

Evaluate clusters of individuals by using their sequence data

1 Answers1