I am using SGD & Adam optimizer in a simple NN for a binary classification problem.
The model converges every time, but when I run the same model for 100 times, I notice that I won't get the same coefficients every time. There is a deviation in coefficient values from benchmark coefficients (that I obtained from Statsmodel) of around 8-10%.
Is there a specific reason why this happens, or we have any proofs around non-convergence to same coefficient values every time for SGD/Adam optimizers ?
I am implementing a simple NN model using TF library. Below are the Hyper-parameters I am using:
Epoch: 300 Optimizer: ADAM/SGD Activation: Sigmoid Learning Rate: 0.005 Batch Size: n/4 (n is # of data points) Regularization: L1/L2