I'm training a binary classification model for fraud detection and my historical dataset is extremely imbalanced. I have tried training a LGBM and XGBOOST models and in both cases I've set the parameter class_weight = 'balanced'.
When running predict_proba, it only returns extreme scores, that means only values close to 0 or values close to 1. I suspect that this is because of the imbalanced dataset. I'm performing treshold moving to find the "sweet spot" though.
But is there any real problem with the scores behaving like that? Should i fix it or try changing the hyperparameters? I know that i can perform a probability calibration, so that the scores are more "reliable". Aside from that, should i concern about the distribution of the probability scores?
class_weight = 'balanced'i guess it just weights this function, like logit regression – Gabriel Monteiro Oct 19 '22 at 12:08