Questions tagged [unbalanced-classes]

Data organized into discrete categories or classes may present problems for certain analyses if the number of observations ($n$) belonging to each class is not constant across classes. Classes with unequal $n$ are unbalanced. This tag should be used for questions about datasets with subsamples of unequal size where imbalanced distributions across categorical factors is of concern.

Analyses with known, non-negligible sensitivity to unbalanced classes include (but are not limited to):

Student's $t$-test
Analysis of variance (see also Howell, 2009)
Some $\chi^2$ tests
Tukey's range test and the Newman–Keuls method
Analysis of differential item functioning using item response theory

Reference

Howell, D. C. (2009). Unequal sample sizes do matter. University of Vermont. Retrieved from http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Unequal-ns/unequal-ns.html.

1053 questions

votes

1 answer

Balanced accuracy vs F-1 score

I was wondering if anyone could explain the difference between balanced accuracy which is b_acc = (sensitivity + specificity)/2 and f1 score which is: f1 = 2*precision*recall/(precision + recall)

unbalanced-classes

asked Feb 08 '13 at 14:10

dvreed77

votes

0 answers

Are there Imbalanced learning problems where re-balancing/re-weighting demonstrably improves accuracy?

I have been looking into the imbalanced learning problem, where a classifier is often expected to be unduly biased in favour of the majority class. However, I am having difficulties identifying datasets where class imbalance is genuinely a problem…

unbalanced-classes

asked Jan 05 '22 at 10:29

Dikran Marsupial

54,432
9
139
204

votes

1 answer

When is dataset considered unbalanced?

I have data set which is highly unbalanced - target attribute is 93% False and 7% True. But I know that this is normal for my kind of data. I am afraid that if I undertake any steps (I can take less False cases for example), I skew the distribution…

unbalanced-classes

asked Apr 05 '16 at 09:41

HonzaB

votes

1 answer

Modelling with Unbalanced dataset

I am working with a fairly unbalanced dataset (event class < 5% - it's a binary classification problem). To deal with this imbalance, I am trying out various techniques such as Oversampling the minority class (as well as synthetically generating…

unbalanced-classes

asked Oct 09 '16 at 06:54

Dataminer

votes

1 answer

When should we avoid balancing data

Can someone point me to some resource (textbook,paper,blog,..) that clearly explains when we should NOT balance data for classification/regression? I found…

unbalanced-classes

asked Jan 23 '18 at 03:03

Alaska Louisiana

votes

0 answers

Unbalanced data and undersampling

When using undersampling to compensate for unbalanced data, what should you use for a testing dataset?

unbalanced-classes

asked Dec 09 '17 at 03:32

AngusE

votes

0 answers

Imbalanced learning - under sampling vs. over sampling vs. weight based classifiers

Does anyone know what is the difference (theoretically speaking) between under sampling over sampling weight based classifiers when dealing with highly imbalanced datasets (1:1000, 1:10000)? When is it recommended to use each one? Is there a…

unbalanced-classes

asked Mar 13 '17 at 13:55

YinnonM

votes

1 answer

How do Adasyn and SMOTE handle categorical data, specifically binary features?

SMOTE oversamples the minority class by creating synthetic data along the line connecting a minority class sample with each (or how many ever are chosen) of its K neighbors. In other words, xnewsample = xoldsample + lambda*(xneigbhor - xoldsample).…

unbalanced-classes

asked Jul 19 '16 at 20:01

Srikant Chari

vote

0 answers

Why class-balancing techniques are sometimes useful?

There are a lot of questions here regarding when to do class balancing, or what to expect of class balancing or whether unbalanced classes are an issue at all. Apparently the "consensus" among most of the top answers on these questions is that, for…

unbalanced-classes

asked Oct 14 '21 at 22:27

eduardokapp

vote

1 answer

Should I upsample both my training as my test set?

I have a highly unbalanced dataset (1000 vs 60). Where I want to use upsampling. The real life distribution of the problem (predicting no show) is probably also very highly imbalanced. My question is two-fold 1) I know that I should keep the…

unbalanced-classes

asked Mar 22 '20 at 13:42

Tibo Geysen

vote

0 answers

Counter intuitive in AUPRC and Recall and Precision and F1 for imbalanced dataset

I would like to ask for some details explanation on comparing several classifiers for imbalanced dataset using the following metrics: Area under the ROC curve, AUC Area under the Precision-Recall curve, AUPRC Recall Precision F1 Score As my data…

unbalanced-classes

asked Mar 07 '19 at 21:15

predictorx

vote

0 answers

Unbalanced distribution of multi-classes, how can I divide training/testing set

experts of the statistics, I am a newbie student in the machine learning field. I just started a job to classify set of scientific abstracts into five classes. The text distribution is as below: Class1: 200 Class2: 950 Class3: 150 Class4:…

unbalanced-classes

asked May 08 '18 at 08:34

W Lee

vote

0 answers

Is there any built-in MSMOTE library?

I am trying to deal with data imbalance within a small dataset. Just found an article talking about SMOTE and MSMOTE here It seems that MSMOTE can overcome the shortages of SMOTE, so I really want to try it. MSMOTE paper is published in 2009,…

unbalanced-classes

asked Apr 09 '17 at 21:35

Cherry Wu

vote

2 answers

Bias-Variance tradeoff for classifying unbalanced classes

I would like to use Bias-Variance trade-off to evaluate training set size in a classification problem. There are two classes which are not balanced (~70/30) and it seems that the common use of misclassification error is not good enough. Which…

unbalanced-classes

asked Apr 01 '15 at 12:18

Eitan

vote

2 answers

How to judge a partition is balanced or unbalanced?

Suppose we distributed $100$ coins to $10$ persons and the $i$-th person got ${x}_{i}$ coins, how to judge the distribution $X=\{{x}_{1}, {x}_{2}, ..., {x}_{n}\}$ (e.g., $X=\{5, 20, 15, 5, 10, 10, 10, 15, 5, 5\}$) is (almost) balanced or not? Is…

unbalanced-classes

asked Dec 14 '14 at 09:10

Lijie Xu

2 Next