Is Hyperparameter Tuning for Maximized Recall a Bad Thing?

Question

I have a somewhat theoretical question: I work in an area that requires a number of anomaly detection solutions. When we approach these problems, we cross-validate and for each fold, we oversample the train set and validate on an unaltered test set. Typically, an unoptimized model will have high accuracy and low precision, recall, and f1 score.

I want to ensure I am accurately detecting anomalies, so I used Ray Tune to configure hyperparameters with an emphasis on F1 Score. My results were similar to the following for the top performing models:

Accuracy	Precision	Recall	F1 Score
0.97	0.80	0.55	0.65
...	...	...	...
0.96	0.67	0.55	0.59

However, I was worried about the low recall values. In this case, anything detected as an anomaly is almost certainly a true anomaly, but almost half would go undetected. So, I then focused my tuning on optimizing Recall and the results were similar to the following:

Accuracy	Precision	Recall	F1 Score
0.46	0.07	0.98	0.13
0.70	0.11	0.96	0.20
...	...	...	...
0.82	0.17	0.85	0.28

In this case, it is possible to choose hyperparameters where almost all anomalies are detected, but at the sacrifice of accuracy. It appears at first glance that the tuner is selecting smaller models with more aggressive learning rates / dropout in an effort to somewhat underfit so that the predictions are more aggressive as well.

I am also thinking of making a custom metric, something like maximize (2*Recall + Accuracy) and am interested in approaches in academia and industry (even very common ones that I may have missed) that use a similar strategy.

Are there existing approaches that achieve this goal? Is this a major misstep? And, is there anything else I've failed to consider?

Welcome to Cross Validated! What would 2*Recall + Accuracy tell you that helps you solve the problem you aim to solve? — Dave, Oct 27 '23 at 17:49
Take a look at this answer by @Dave to a recent very much related question. Also, the threads linked here may be helpful. — Stephan Kolassa, Oct 27 '23 at 17:50
I'd still like to know what 2*Recall + Accuracy would teach you. Even if it turns out to be completely useless, it would be helpful for people to know where your head is. — Dave, Oct 27 '23 at 17:51
Thank you both for your help - @Dave I was thinking that finding hyperparameters that optimize 2*Recall + Accuracy would result in a model that targets high recall without completely disregarding accuracy. I.e. we predict almost all of the anomalies, but allow the model to miss some in lieu of drastically overpredicting them. I am running a test on this now and may update my question if it results in anything promising. Edit: instead of specifically optimizing this, I think just picking one of the trial the resulted in a more balanced accuracy/recall ratio might also suffice. — Branden Keck, Oct 27 '23 at 18:10
But why that calculation in particular? Why not multiply the recall by three or seven or twelve? — Dave, Oct 27 '23 at 18:14
@Dave that is a very good point, and doing things this way results in a sort've hyper-hyperparameter, which essentially is the opposite of the goal of the experiment. Choosing 2 "feels" right because I want accuracy to be about half as important as recall, but now I'm just contriving a reason for why I picked that number. I guess I was hoping there was an academic reason out there for something similar. — Branden Keck, Oct 27 '23 at 18:17
What I think you're looking for is the utility of the decisions to be made based on model predictions, how much you will profit, for example. Is that about right? If so, do you have some way to quantify that utility? What do you get from identifying a $0$ as a $0?$ A $1$ as a $1?$ What do you lose by misclassifying a $0$ as a $1?$ A $1$ as a $0?$ — Dave, Oct 27 '23 at 18:22
Also, especially if the consequences of misclassification are severe, is there some kind of grey zone where you might not want to make a decision? Perhaps you need to acquire more data or can send such a case for review by a human expert (a doctor, for example). — Dave, Oct 27 '23 at 18:29

Is Hyperparameter Tuning for Maximized Recall a Bad Thing?

0 Answers0