Model only exceeds baseline in a certain fine-tuning condition

Question

I am training an vanilla 5-layers LSTM. My task is trying to compare two models between without (baseline) and with the additional features (compared model). However, I found out that the compared model only surpass the baseline in a certain way of fine-tuning.

For example, I set up learning rate as 0.01, the compared model wins, but when I set up learning rate as 0.005, the baseline wins. Tuning other hyperparameters are also cause the comparison difference.

Is it normal to have this kind of situation? How should I explain this?

score 5 · Accepted Answer · answered Mar 12 '23 at 09:42

If your additional features are simply not highly predictive, then this can certainly happen. Not every additional predictor or ore complex model necessarily improves accuracy. You may find this helpful: How to know that your machine learning problem is hopeless? Also, take a look at the bias-variance tradeoff.

In addition, it may of course be that the precise hyperparameter you need for your focal model to outperform the baseline varies, too. A learning rate of 0.01 may mean that the focal model is better than the baseline on your particular test set. On another test set, the optimal learning rate may well be different. I would suggest you do some cross-validation to get an idea of how variable the optimal hyperparameter is - and to see how confident you can be that a hyperparameter you set to its optimal value in training and testing continues to perform well in production on yet newer data.

score 2 · Answer 2 · answered Mar 12 '23 at 09:51

Why would you expect the model to beat the baseline with any hyperparameters? One can easily imagine coming up with absurd hyperparameters for a model that could lead to arbitrarily bad results.

For example, I set up learning rate as 0.01, the compared model wins, but when I set up learning rate as 0.005, the baseline wins.

The learning rate is closely related to batch size and the number of epochs needed for training. When changing the learning rate, did you alter other parameters? Did you try training the model say 20x longer?

Model only exceeds baseline in a certain fine-tuning condition

2 Answers2