I am training an vanilla 5-layers LSTM. My task is trying to compare two models between without (baseline) and with the additional features (compared model). However, I found out that the compared model only surpass the baseline in a certain way of fine-tuning.
For example, I set up learning rate as 0.01, the compared model wins, but when I set up learning rate as 0.005, the baseline wins. Tuning other hyperparameters are also cause the comparison difference.
Is it normal to have this kind of situation? How should I explain this?