0

I am currently working with a slightly imbalanced dataset (9% positive outcome) and am using XGBoost to train a predictive model.

    XGB = XGBClassifier(scale_pos_weight = 10)

Before calibration, my sensitivity and specificity are around 80%, but the calibration curve has slope 0.5

After calibration, the calibration curve looks great (slope = 0.995), but sensitivity and specificity decreased dramatically. Is a side effect of the calibration? Any thoughts on how to maintain my classification accuracy?

Thanks!

arjunv0101
  • 31
  • 3
  • 2
    Calibration generally does not change the ranking of samples, and thus cannot change the shape or area under the ROC curve. Maybe you just need to adjust your decision threshold? – Eike P. Sep 16 '22 at 18:28

1 Answers1

1

Unbalanced classes are almost certainly not a problem: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?

Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold

The same problems apply to sensitivity and specificity, and indeed to all evaluation metrics that rely on hard classifications. Instead, use probabilistic classifications, and evaluate these using proper .

Stephan Kolassa
  • 123,354
  • 2
    Accuracy should be used to evaluate the performance of a classifier IF it is the appropriate performance statistic for the problem (a hard classification IS required and the misclassification costs are equal). It shouldn't be the only performance statistic and it won't necessarily be the best model selection criterion, but proper scoring rules don't necessarily give the best performance where a hard classification is required. – Dikran Marsupial Jan 15 '23 at 09:28
  • 2
    @DikranMarsupial: I completely agree. And I would bet that 99% of people who use accuracy never spent more than two minutes' thought on whether accuracy is appropriate, or what the costs of "misclassifications" are, or whether it even makes sense to classify instances into exactly two classes "healthy"/"sick", or whether it would not be better to explicitly think about the difference between probabilistic classification ("probability of being sick") vs. subsequent actions, of which there may be more than one ("send home"/"run more tests"/"immediate surgery"). – Stephan Kolassa Jan 15 '23 at 09:33
  • 2
    @StephanKolossa Yes, the main problem with class imbalance is that practitioners don't think enough about the needs of the application. However I think it is better to argue that we should use proper scoring rules rather than that we should not use accuracy - we should explain when we should use accuracy (or more generally the expected loss). Specificity and sensitivity seem to me to be more useful in diagnosing problems than performance analysis though. – Dikran Marsupial Jan 15 '23 at 09:39