Agreement between two machine learning models

Question

I'm making a machine learning model (right now I'm using average weighted neural networks) to predict a binary variable. I have historical data on which I can train this model, but when new models are trained and used for predictions about the "real future", there will be some bias in the selection of the data that will be used.

I can introduce the same bias in the historical data. So my plan is to compare the outcome of two models: One that uses a biased sample and another one that uses an unbiased sample (both using the same sample size n). I would like to compare both the outcomes of the model, but what I'm also very interested in is to to say something about the level of agreement of both models. For this last part I'm having difficulty finding more information. Both models will output a probability on the same test set, so I'm looking for a method to quantify their agreement or some other techniques that might be of use here.

Thanks!

score 0 · Answer 1 · answered Mar 09 '22 at 16:25

You have two continuous variables: sets of predicted probabilities. You can use standard methods to assess the agreement between two continuous variables. I would start by using Bland-Altman methods. Note that these are primarily qualitative, though. If you want to quantify the agreement, you can use Lin's concordance coefficient. Fewer people are familiar with it, so if you want metrics that are easier for an audience to grasp, you can compute the Pearson correlation, $r$, plus the two means and the two standard deviations. If you want to read further, I might suggest 1 and/or 2.

score -1 · Answer 2 · answered Nov 23 '17 at 15:14

It depends on the context of your audience. I think a really nice simple way to explain to non statistical audiences is to compute your predictions with both models (0 or 1) and see what % of the predictions are the same. Otherwise you can compare the RMS value between them (again using the predictions or the probabilities).

A slightly better measure is Cohen's Kappa, because this takes into account the probability of the models agreeing purely by chance. With this method, you will need to use the predicted values, not the probabilities.

I hope this helps!

Agreement between two machine learning models

2 Answers2