I'm trying to do threshold moving to get the appropriate threshold for an imbalanced dataset. I have a 1D timeseries that I am applying a binary transformer-based classifier on. I have:
Training set:
Total samples: 8133
Number of 0s: 6930 (85.21%)
Number of 1s: 1203 (14.79%)
Validation set:
Total samples: 904
Number of 0s: 770 (85.18%)
Number of 1s: 134 (14.82%)
Test set:
Total samples: 232
Number of 0s: 198 (85.34%)
Number of 1s: 34 (14.66%)
NOTE: I am trying to fix the threshold on the VALIDATION SET. I've seen a lot of people using the test set for that but I think that would bias the classifier on the test set. Please correct me if I am wrong here.
I have used sklearn test_train_split and used a stratify=y and passed a random_state as well to get reproducible results. Now the problem is that I am getting wildly varying "best thresholds" when using the ROC-AUC curve - even though the area is approximately the same. I'm trying to do 2 experiments:
- Using different methods like: (a) normal training (no weighting to less-freq class), (b) class_weight based training, (c) SMOTE on training set
- Training till various
num_epochs(varying epoch number)
In (1), I'm getting vastly different results which I can't understand. For example, normal training gives a "best threshold" of 0.01, class_weights approach gives a best threshold of 0.45 and SMOTE gives a best threshold of 0.02.
The huge variation is something I dont understand. To note, for SMOTE, the distribution is the following (only applying SMOTE on trainset):
Training set:
Total samples: 13860
Number of 0s: 6930 (50.00%)
Number of 1s: 6930 (50.00%)
Validation set:
Total samples: 904
Number of 0s: 770 (85.18%)
Number of 1s: 134 (14.82%)
Test set:
Total samples: 232
Number of 0s: 198 (85.34%)
Number of 1s: 34 (14.66%)
Even in the epoch variation, when using class_weights approach all across, and training with 10, 20, 40, 60, 80, 100 epochs, I get a wide range of variation of threshold while trying ROC-AUC analysis on the validation set. All the way from 0.002-0.7 - which is wild and I cant really understand where the problem is with the best threshold. Is there a flaw in logic of how I'm doing things? The following is the code for ROC-AUC that I'm using
def plot_roc_auc_curve(y_true, y_pred_probs, best_threshold):
# Compute ROC curve and ROC AUC
fpr, tpr, thresholds = roc_curve(y_true, y_pred_probs)
roc_auc = auc(fpr, tpr)
# Plot ROC curve
plt.plot(fpr, tpr, lw=1, label='ROC (area = %0.2f)' % (roc_auc))
plt.plot([0, 1], [0, 1], color='navy', lw=1, linestyle='--')
plt.plot(fpr[thresholds == best_threshold], tpr[thresholds == best_threshold], 'ko', label='Best Threshold')
plt.text(fpr[thresholds == best_threshold], tpr[thresholds == best_threshold], f'Best Threshold:{best_threshold:.2f}')
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()
return roc_auc, thresholds
def get_best_threshold(y_true, y_pred_probs):
fpr, tpr, thresholds = roc_curve(y_true, y_pred_probs)
best_threshold = thresholds[np.argmax(np.abs(tpr - fpr))]
return best_threshold
I'm attaching a few screenshots of the ROC-AUC curve with the best threshold plotted as derived from this code. Please help me debug/interpret the results.
EDIT: When I make the threshold 0.01 (for the vanilla case) as suggested by the code, and re-run VALIDATION, I get a lower F1 Score than when it was 0.5



I'm wondering about why the best threshold is so variable in this case
– Techie5879 Dec 18 '23 at 13:30