Ensemble learning with models of different quality. Develop a voting method that takes accuracy, F1, recall, calibration of each model into account

Question

Lets assume I have 24 random forest models. Each of 24 random forest models produces a class prediction. I am currently using simple majority voting to select final prediction. Can someone please guide me in the direction of other options that I can use? Specifically , weighted or using another model on top of this? I see that out of 24 random forests, some have quite bad recall and f1 scores, some are also really badly calibrated. Thus, I want to develop a voting method that takes this into consideration.

score 1 · Answer 1 · answered Aug 23 '23 at 09:51

Here are a couple of options that would be straightforward for you to implement:

What you are doing is called hard voting i.e. each model votes on the class outcome. A likely better option is soft voting, in which each model's vote is weighted by its confidence. I've explained this in more detail here: Hard voting, soft voting in ensemble based methods
Another way to integrate the information in the separate models is stacking. This involves training a different model on the predictions of the individual models in the ensemble.

In general, there's only a few strategies that are generally used to integrate different machine learning models, discussed well in this thread: Bagging, boosting and stacking in machine learning

Ensemble learning with models of different quality. Develop a voting method that takes accuracy, F1, recall, calibration of each model into account

1 Answers1