Not balancing the validation and test set gives me a very bad dev set result

Question

Here is the resulting heatmap for train, val, test set. I also apply PCA to train,val,and test set. Because from the train set, there are a lot of features that has high correlation with each other.

According to what i searched online here and here, we should not balance the validation and test set as its better to make it more like real life situation. But with that approach, my model perform really poorly on val set

as you can see, my model perform very very poorly on the dev test

I mainly suspect the imbalanced and PCA part because other preprocess that i have done was filling missing data. and the missing data does not account for even 0.1% of my data

Is there something that i did that is just fundamentally wrong or what?

Welcome to Cross Validated! Did you balance the training data? If so, why? Class imbalance typically isn’t a problem, and you don’t need to apply artificial balancing to solve a non-problem. — Dave, Nov 21 '22 at 13:47
Because the minority class is literally overshadowed, only 0.017% of the data. If i didn't balance the data, wouldn't my model just classify any data as the majority class? I learnt from past project that balancing the data will give me a significant improvement on the f1-score — UrDailyCS, Nov 21 '22 at 14:01
It depends on how you set the threshold for classification. Remember, most models output predictions on a continuum, such as probabilities. If you want to set a threshold to map those to discrete categories, you can, but that’s not necessarily the optimal way to proceed. I strongly suggest that you read the link in my earlier comment. — Dave, Nov 21 '22 at 14:06
I do not see that your model is performing poorly. Accuracy is a very poor evaluation metric, especially (but not exclusively!) in "unbalanced" situations: https://stats.stackexchange.com/q/312780/1352. See also the links here: https://stats.meta.stackexchange.com/q/6349/1352 — Stephan Kolassa, Nov 21 '22 at 14:07
And $F_{1}$ suffers from many of the same issues as accuracy. — Dave, Nov 21 '22 at 14:19
@Dave okay, so after quite a few times re-reading these sites articles that you guys give, I still don't understand a lot of things.. but some takeways that i take is, i should just the data 'let it be'? Don't need to do SMOTE or any oversampling technique? And accuracy and to some extent F1 score isn't a good index. So what is? I read that mention a good scoring-rules, I would assume the 'loss'? but my loss function is also much worse on the dev set — UrDailyCS, Nov 22 '22 at 11:11

Not balancing the validation and test set gives me a very bad dev set result

0 Answers0