If you have an unbalanced dataset, but you assign (inverse) class weights when fitting, does this mean that model loss and accuracy metrics will be computed to allow for using ROC AUC and accuracy, both metrics that require a balanced dataset?
ROC AUC and accuracy metrics can be misleading if you use an imbalanced dataset. You can achieve high accuracy or ROC AUC by simply selecting the majority class all the time. So, to appropriately measure a model's ability, using different metrics like Precision/Recall AUC, etc. might provide a better accuracy metric. Detailed discussion on this topic here
But if you assign class weights while fitting that essentially neutralizes the imbalance, does that mean that the resulting ROC AUC and accuracy metrics can be relied upon?
For example, if your binary classification dataset has a balance of 1:4, but you assign class weights 4:1 while fitting, the model should interpret the minority class with 4x the weight. This should neutralize the impact of class imbalance and allow the use of accuracy metrics that rely upon a balanced dataset.
Is this reasoning sound?
set.seed(2021); N <- 10000; p <- 0.01; y <- rbinom(N, 1, p); preds <- rep(mean(y), N); my_roc <- pROC::roc(y, preds); my_roc$aucI get an AUC of $0.5$ when I have about a $99:1$ class imbalance and always randomly guess based on the class ratio, indicating that the model is a poor one. – Dave Dec 17 '21 at 20:58