2

I want to find a method to compare two dendrograms. I have documents that were read by some people and they made a dendrogram that indicates which documents they think are more similar. Now I want to make dendrograms based on those documents using a few different document similarity scores starting for example from tf–idf. I want to know which similarity score gives dendrogram the most similar to what those people think. How should I compare those dendrograms? I would prefer Python libraries, the only thing I found is this tree edit distance https://github.com/timtadh/zhang-shasha

stachuk
  • 21
  • 1
  • 1
    Some form of cophenetic correlation is a common method for that. See related question https://stats.stackexchange.com/q/63546/3277. – ttnphns May 03 '23 at 20:30

1 Answers1

2

The R package dendextend implements some of the methods you are looking for. You can implement the baker's gamma statistic. You could find the reference to it in the dendextend paper: https://academic.oup.com/bioinformatics/article/31/22/3718/240978

If you get to implement it in python, please mention it here for me and others in the future to know about it.

Tal Galili
  • 21,541