Neural network parameters dependency vs gradient descent

Question

Very deep models involve the composition of several functions or layers. The gradient tells how to update each parameter, under the assumption that the other layers do not change. In practice, we update all of the layers simultaneously.

— Page 313, Deep Learning, 2016.

Do we violate this assumption in practice? If so, what are the consequences of this violation? One consequence is that we cannot guarantee that updating all parameters in a single step will move us in the direction of the steepest descent. Even if we have superb data in great amounts, the loss function is convex and the single gradient step is calculated based on all the samples. Is that correct? This is because simultaneously updating all parameters does not take into account their dependence on each other, correct?

I am. I guess it's not clear for me how is this dependency resolved with it. Hmm... Is it obvious? Should I study it more? — Glue, Mar 22 '23 at 10:01
Probably yes. Once you understand it you can also have a look at Backpropagation — J. Delaney, Mar 22 '23 at 10:12
Hmmm to be honest I know both the chain rule and the backpropagation algorithm. However, the issue is still not obvious to me. Maybe my understanding is superficial or it is skewed in some weird way :D. But thanks for pointing what should clarify my question. — Glue, Mar 22 '23 at 17:41
The answer is [tag:backpropagation], which uses chain the rule for differentiation to compute the gradient. Then gradient descent updates the parameters. If this duplicate doesn't answer your question, you'll need to be more specific about what you know, what you want to know, and where you have doubt. — Sycorax, Mar 22 '23 at 17:43
Studied this stuff and I'm still not sure. If I am right, this is a big deal and I would like to confirm it. I'm not asking about the mechanisms of backprop, I'm asking about its consequences. Can anyone at least say either - "yes, you're are obviously right - it follows from backpropagation" or "no, you're obviously not right - it follows form backpropagation"? — Glue, May 02 '23 at 16:30

Neural network parameters dependency vs gradient descent

0 Answers0