SMOTE isn't helping with logistic regression reults - for imbalanced data

Question

I have a dataset that is highly imbalanced, talking about 230 cases of class 1 in the target feature, and more than 3800 of class 0. I used SMOTE to resample and then built a logistic regression model. Logistic regression seemed prefect to me for this kind of data since it's a binary target feature. The results aren't much better after applying SMOTE.

Here is the code :

from imblearn.over_sampling import SMOTE
from sklearn.metrics import plot_confusion_matrix
smote = SMOTE()
X_train_resampled , Y_train_resampled = smote.fit_resample(X_train,Y_train)
Make predictions using the model
logisticModel = LogisticRegression().fit(X_train_resampled, Y_train_resampled)
Y_pred = logisticModel.predict(X_test)
Scoring the model
logisticModel.score(X_test,Y_test)
print(classification_report(Y_test, Y_pred))

Confusion matrix after SMOTE:

              precision    recall  f1-score   support
       0       0.99      0.74      0.85       942
       1       0.11      0.75      0.19        40

accuracy                           0.74       982

macro avg       0.55      0.75      0.52       982
weighted avg       0.95      0.74      0.82       982

Before SMOTE:

              precision    recall  f1-score   support
       0       0.98      0.69      0.81       934
       1       0.11      0.75      0.19        48

accuracy                           0.69       982

macro avg       0.55      0.72      0.50       982
weighted avg       0.94      0.69      0.78       982

So as you can see, the precision hasn't got any better, only the recall, slightly. SMOTE is one of the best metrics and I've used it previously, however in this case I seem to miss something.

What should I do?

Unbalanced classes are almost certainly not a problem, and oversampling, undersampling or weighting will not solve this non-problem: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help? — Stephan Kolassa, Jul 31 '22 at 18:39
Don't use accuracy, precision, recall, sensitivity, specificity, or the F1 score. Every criticism at the following threads applies equally to all of these, and indeed to all evaluation metrics that rely on hard classifications: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold — Stephan Kolassa, Jul 31 '22 at 18:40
Instead, use probabilistic classifications, and evaluate these using proper scoring rules. — Stephan Kolassa, Jul 31 '22 at 18:40
Profusion of threads on imbalanced data - can we merge/deem canonical any? — Stephan Kolassa, Jul 31 '22 at 18:40

SMOTE isn't helping with logistic regression reults - for imbalanced data

Make predictions using the model

Scoring the model

Confusion matrix after SMOTE:

Before SMOTE:

0 Answers0