0

I have a binary classifier for a highly imbalanced multivariate time series.

I use an LSTM Network to predict the next time step and use the prediction error to decide whether a data point is an anomaly or not. In addition, I have the advantage of being able to train my Network on a data set that contains only negative cases.

I have a training and validation set for the network and a test set for the final classification. The positive class makes up ~1% of the test data (~20/2000). The use case where I have suffers from the chance of being abandoned if the network results in too many false positives. So it is more like finding a needle in a haystack.

My PR-AUC is stuck at around 0.05 to 0.10. I currently use the PR-Curve to select the threshold where the F1-Score is highest. The model with the highest PR-AUC returns a result which is somewhat close to what I want (5TP, 10 FP).

So what is a "good" PR-AUC score given a highly unbalanced data set? How can I interpret that?

Can I perform undersampling on the test data which I only use to make predictions and classify the observations based on the prediction error?

Teapot
  • 1

0 Answers0