I am working on random forest classifiation with a dataset size of 977 records and 6 features. However, my class is imbalanced and proportion is 77:23
I was reading about calibration of models (binary classification) to improve/calibrate the predicted probabilities of actually fitted model (RF in this case).
However, I also found out that calibration model has to be fit using a different dataset.
But the problem is, I already used sklearn train and test split - 680 records for my train and 297 records for my test (of random forest model)
Now, how can I calibrate my model (as I don't have any new data)
Especially, as I am using Random forest, I wish to calibrate my model for better predicted probabilities?
If you ae interested to look at my calibration curve and brier score loss, please find below
update - extra trees classifier
update - logistic regression
update - bootstrap optimisim




fix those upper estimates first, do you refer the predicted probabilities >=0.6 by where there is behavior of underfitting and overfitting? Is that what you mean? When you mean fix, what do you think can be done? I am trying to learn. Meaning, I have done feature selection, hyperparameter tuning etc. Of course, I am trying extra trees classifier now. Is there anything else that you think I can do (other than collect more data which I can't because we don't have).. – The Great Apr 02 '22 at 14:2130% fewer data to learn from <1000 points- but my model learns from 70% of the data (and not 30%)... – The Great Apr 02 '22 at 14:22