SMOTE parameters optimization problem

Question

I have a date set with 3 imbalanced groups: 10%, 3%, and 88%. I am using the SMOTE algorithm (in the R SMOTE family package) to up-scale the 2 minority groups.

I did this twice:

dup_size = 3 and 6 respectively for each minority group. This new groups are: 27%, 13%, and 60%
dup_size = 6 and 25, resulting in: 31%, 30%, and 39%.

Afterwards, I did a ordinal logistic regression. With the first dup-size parameters, none of the instances were classified into the the second class (i.e., all instances were classified in the first or second group). But with the second dup-size parameters, instances were classified into the 3 classes (although the accuracy was low).

It seems that the dup_size parameter effects the ordinal logistic regression classification. Should it effect the classification? If so, how do I define the correct dup-size? and if not, what is my error?

Are unbalanced datasets problematic, and (how) does oversampling (purport to) help? You might be applying SMOTE in order to solve a non-problem. — Dave, May 19 '22 at 11:19
"one of the instances were classified into the the second class" what suggests that is not the optimal solution (see https://stats.stackexchange.com/questions/539638/how-do-you-know-that-your-classifier-is-suffering-from-class-imbalance ). What is your performance metric for this application? — Dikran Marsupial, May 25 '22 at 09:13
I looked at several (accuracy, f1, sensitivity, etc.), but the problem is that none of the instances are classified into the 2nd class, so even with a high performance metric, this isn't a good result. — user3315563, May 26 '22 at 10:15
@user3315563 no, that may be the optimal solution. If the density of the second class in operational conditions is always lower than that of one of the other classes everywhere in the input space, then the optimal classification will always be to assign all patterns to one of the other classes. The only way in which that would not be true would be if the misclassification costs are unequal (i.e. cost-sensitive learning). — Dikran Marsupial, Jun 05 '22 at 08:36

SMOTE parameters optimization problem

0 Answers0