Deciding Initial Weights for Gradient Descent Algorithm

Question

Having done an interesting Coursera module on Linear Regression, and being provided with the starting values for Gradient Descent, I am wondering about some things that were not touched upon, and, for which I could not find the answer to:

What is the way to estimate the initital starting values for the weights? I do not seem to find anything here.

I think it does not matter if you have infinite computing power and time.

But, may be I am missing something obvious.

Haitao Du · Accepted Answer · 2018-03-19T17:04:38.560

0

For a convex function it does not matter where to start. The objective function for linear regression using squared loss or least absolute difference loss are convex functions.

For non-convex functions, such as neural network, where to start matters a lot. Because even with infinite computing power and time, the gradient decent approach can still stuck with local minima or saddle point. There are many work to discuss how to initialize neural network weights.

edited Mar 19 '18 at 17:04

answered Mar 19 '18 at 16:49

Haitao Du

36,852
25
145
242

OK, but just a thought or a very stupid question: If I have a model and choose N features, how do I know if it will be a convex or non-convex function? If it is convex then no issue where to start weights from you state. But if not, then this is important. Also, the local mimina, saddle point I get, but still how do I know if we are using convex, concave or not? – thebluephantom Mar 20 '18 at 21:14

Deciding Initial Weights for Gradient Descent Algorithm

1 Answers1