Adam is an adaptive learning rate method, why people decrease its learning rate manually?

Question

Adam optimizer is an adoptive learning rate optimizer that is very popular for deep learning, especially in computer vision.

I have seen some papers that after specific epochs, for example, 50 epochs, they decrease its learning rate by dividing it by 10.

I do not fully understand the reason behind it.

How do we do that in Pytorch?

Tim · Answer 1 · 2022-03-08T10:19:44.543

There is no one-size-fits-all optimizer. Adam sometimes works, sometimes doesn't. If you look at what is used in different research papers, you would see different optimizers used. There are some authors who argue that you should use vanilla SGD. Adam's learning rate may need tuning and is not necessarily the best algorithm. But there is also research showing that it may be beneficial to use other (that Adam's) learning rate schedules. So it is not that easy, Adam isn't necessarily enough.

Adam is an adaptive learning rate method, why people decrease its learning rate manually?

1 Answers1