For a dataset of several hundred individuals, I applied a hierarchical clustering to generate clusters based on a functional trait that sets them apart.
My task is now to evaluate if these clusters can be supported by the nucleotide sequence data of the corresponding gene, i.e. is there more genetic similarity within each cluster than between the clusters.
For a first approach, I created a multiple sequence alignment for every cluster and calculated the the % identical sites value. There are more identical sites within each cluster than in an alignment of all sequences.
Do you know any good tools (preferably python) that can perform a more sophisticated evaluation of existing clusters based on sequence data? Are there other scoring methods with which I can evaluate the sequence similarity within the clusters (again preferably python)?