0

I'm building a machine learning model to predict a process failure (1=fail, 0=no-fail). To begin with, I have a class imbalance ratio of 1:51. After some wrangling, I applied Clustering-Based Oversampling to fix the imbalance (ratio 1:1.001). Then I tried a logistic regression model and this were the results:

          precision    recall  f1-score   support
   0       0.98      0.93      0.95     10638
   1       0.04      0.15      0.06       208

accuracy 0.91 10846 macro avg 0.51 0.54 0.51 10846 weighted avg 0.96 0.91 0.94 10846

I want to know what does a high F1 score for 0 and low F1 score for 1 means before I go any further experimenting with different algorithms.

Info about the dataset:

  • 22 predictive features:
  • 1 numerical continuous, independently normalized with np.log
  • 2 numerical binary (0, 1)
  • 2 numerical ordinal, ranges 1-4 and 0-2 respectively
  • 17 numerical binary that were one-hot encoded from 17 classes (I didn't apply n-1 variables)

Info about multicollinearity:

  • I have 6 variables with VIF ranging from 2 to 6.

Info about the model:

LR = LogisticRegression(C=0.01, solver='liblinear').fit(X_train, y_train.values.ravel())

0 Answers0