I'm using the R h2o package for building a deep net with three hidden layers. When inspecting the model object, I'm noticing the training RMSE fluctuates as a function of number of epochs. I'm assuming with a stable gradient, the train RMSE should monotonically decrease as a function of epochs until convergence.
Are there parameters I should vary to stabilize learning as a function of epochs?