I have two models that I've trained using different ML algorithms. I want to compare how well they match each other as a test of whether they are generalising the training set or simply memorising it. I could calculate the coefficient of determination of one wrt to the other but that biases one as a superior model. Are there any standard ways to assess the similarity of two sets of predictions?
1 Answers
There are a couple of ways depending on what your output looks like and what your goal is. You could calculate Kendall's coefficient of concordance. It is a ranked measure so it does not depend upon the underlying distributions. See for example
Amanda Gearhart, D. Terrance Booth, Kevin Sedivec & Christopher Schauer (2013) Use of Kendall's coefficient of concordance to assess agreement among observers of very high resolution imagery, Geocarto International, 28:6, 517-526
If this is categorical data, then you could use Cohen's Kappa instead.
There is a decent discussion of continuous measures that include Bland-Altman diagrams for continuous measures and Cohen's kappa for categorical measures at
Kwiecien, Robert, Annette Kopp-Schneider, and Maria Blettner. “Concordance Analysis: Part 16 of a Series on Evaluation of Scientific Publications.” Deutsches Ärzteblatt International 108.30 (2011): 515–521. PMC. Web. 1 Apr. 2018.
- 7,630