I have been studying the way Adam optimizer works, and how it combines both RMSProp and Momentum optimizers.
So the following question arises: Why not combine Nesterov Accelerated Gradient together with RMSProp? Wouldn't that yield better results? And by doing a quick search, one finds that, indeed, there exists an optimizer called Nadam that does precisely that: combine NAG with RMSProp. And it seems to yield better results as it can be checked in the following links: [Paper] [Report]
I have even seen that Keras already has an implementation for Nadam: Keras-Nadam
My question is, why does the deep learning community still prefer Adam optimizer? Why is Adam still the most established optimizer, when in my opinion, Nadam makes more sense?
If it was seen that Nesterov was an improvement over Momentum, why not use Nadam?
Thanks in advance!