Questions tagged [stochastic-gradient-descent]

211 questions
4
votes
1 answer

Stopping criteria for stochastic gradient descent?

When using stochastic gradient descent, how do we pick a stopping criteria? A benefit of stochastic gradient descent is that, since it is stochastic, it can avoid getting stuck in a suboptimal region. So I don't understand how we can pick a…
aha
  • 41
2
votes
0 answers

SGD on finite datasets

In stochastic convex optimization, if $F(w) = E[l(w^Tx,y)]$, when l is a convex, L-Lipschitz loss function, it can be optimized using SGD such that $E[F(\bar{w}_T)] = \frac{1}{T} E[F(w_t)] \leq \min F(w) + \frac{RL}{\sqrt{T}}$ Assuming that…
Yakov Dan
  • 121
2
votes
0 answers

Mini-Batch Gradient Descent - Large batch size require small learning rate?

Coursera Machine Learning in the Enterprise - Science of Machine Learning and Custom Training says large batch size require smaller LR. However, How should the learning rate change as the batch size change? suggests otherwise. However, recent…
mon
  • 1,468
1
vote
0 answers

What are good packages for online linear regression besides Vowpal Wabbit?

Does anyone know of online learning packages that implement NG and NAG algorithms from Stephen Ross' paper: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/http://auai.org/uai2013/prints/papers/192.pdf. I know Vowpal Wabbit is the main package…
0
votes
0 answers

Getting SGDRegressor to converge to equivalent RidgeCV R2 results

I have a model of some financial data that achieves an R2 of ~0.01 (1%) using RidgeCV -- this is about what I expect. I'm exploring building the equivalent model using SGDRegressor so I can leverage partial_fit to do incremental training over larger…
0
votes
1 answer

Anybody know of good material or videos to help me understand Stochastic Gradient Descent?

I am trying to understand stochastic gradient descent a bit better as I'm not 100. Does anybody have any materials or videos that they would recommend to me that might help describe the concept? I'm really trying to get a better understanding of it.…
0
votes
1 answer

Is training loss guaranteed to decrease for stochastic gradient descent?

When performing stochastic gradient descent, it is necessary for the training loss to decrease a) between iterations in an epoch? (I think the answer is no) b) between epochs? (I think the answer is yes) The training loss is always defined using the…
ved
  • 1,182