I implement a classifier in python based on Negative Selection (Artificial Immune Systems) that classifies a dataset of transactions as either fraud (Class 1) or non-fraud (Class 0). The dataset is highly imbalanced so I use 5-Fold stratified cross validation and undersampling in the training data. For some reason I always get Recall=0, Precision=0 in Class 1 of Fold1, no matter how many times I run the script with different parameters. I get really good results on Folds 2,3,4,5. Does anyone know why is this happening only in Fold 1?
Example Output:
Details of Fold 1 Test Set:
Number of samples in Fold 1 Test Set: 3084
Class 0 samples in Fold 1: 2900
Class 1 samples in Fold 1: 184
Fold 1
Train Set Size Before Undersampling: 12336
Train Set Size After Undersampling: 1478
Test Set Size: 3084
Number of detectors generated: 737
Class 0: Precision = 0.9403
Class 0: Recall = 1.0000
Class 1: Precision = 0.0000
Class 1: Recall = 0.0000
AUPRC on Test Set: 0.5298
UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Details of Fold 2 Test Set:
Number of samples in Fold 2 Test Set: 3084
Class 0 samples in Fold 2: 2900
Class 1 samples in Fold 2: 184
Fold 2
Train Set Size Before Undersampling: 12336
Train Set Size After Undersampling: 1478
Test Set Size: 3084
Number of detectors generated: 1475
Class 0: Precision = 0.9986
Class 0: Recall = 1.0000
Class 1: Precision = 1.0000
Class 1: Recall = 0.9783
AUPRC on Test Set: 0.9898
Details of Fold 3 Test Set:
Number of samples in Fold 3 Test Set: 3084
Class 0 samples in Fold 3: 2899
Class 1 samples in Fold 3: 185
Fold 3
Train Set Size Before Undersampling: 12336
Train Set Size After Undersampling: 1476
Test Set Size: 3084
Number of detectors generated: 2213
Class 0: Precision = 1.0000
Class 0: Recall = 1.0000
Class 1: Precision = 1.0000
Class 1: Recall = 1.0000
AUPRC on Test Set: 1.0000
Details of Fold 4 Test Set:
Number of samples in Fold 4 Test Set: 3084
Class 0 samples in Fold 4: 2899
Class 1 samples in Fold 4: 185
Fold 4
Train Set Size Before Undersampling: 12336
Train Set Size After Undersampling: 1476
Test Set Size: 3084
Number of detectors generated: 2951
Class 0: Precision = 1.0000
Class 0: Recall = 1.0000
Class 1: Precision = 1.0000
Class 1: Recall = 1.0000
AUPRC on Test Set: 1.0000
Details of Fold 5 Test Set:
Number of samples in Fold 5 Test Set: 3084
Class 0 samples in Fold 5: 2899
Class 1 samples in Fold 5: 185
Fold 5
Train Set Size Before Undersampling: 12336
Train Set Size After Undersampling: 1476
Test Set Size: 3084
Number of detectors generated: 3689
Class 0: Precision = 1.0000
Class 0: Recall = 1.0000
Class 1: Precision = 1.0000
Class 1: Recall = 1.0000
AUPRC on Test Set: 1.0000
Summary after Cross-Validation:
Number of clusters: 1476
Threshold: 1
Average number of detectors generated: 2213.00
Overall performance metrics averaged over folds:
- Precision: 0.8000
- Recall: 0.7957
- AUPRC: 0.9039
Total runtime: 2 min, 53 sec
I tried simple 5-fold cross validation and I get the same bad results in Fold 1.