2

I am reading the Wikipedia on gradient descent on a non-linear system just to get better understand of how they work under the hood. I can follow everything up until this line:

enter image description here

I understand the definition of a Jacobian, but what I don't understand is how this equation above is true. Can someone point me to a resource, or maybe give me some intuition as to why this is true?

YellowPillow
  • 1,251
  • 1
    This follows immediately from the definition of the Jacobian and the Chain Rule. Any multivariable Calculus text will serve as a good resource. – whuber Oct 21 '16 at 19:43

1 Answers1

1

By definitions: \begin{equation} \nabla F = \begin{bmatrix} \dfrac{\partial F}{\partial x_1} \\ \vdots \\ \dfrac{\partial F}{\partial x_n}\end{bmatrix}, \quad G = \begin{bmatrix} g_1 \\ \vdots \\ g_m\end{bmatrix} \end{equation} \begin{equation} J_G = \begin{bmatrix} \dfrac{\partial g_1}{\partial x_1} & \cdots & \dfrac{\partial g_1}{\partial x_n}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial g_m}{\partial x_1} & \cdots & \dfrac{\partial g_m}{\partial x_n} \end{bmatrix} \end{equation}

Now, we have: \begin{equation} J_G^\top\cdot G = \begin{bmatrix} \dfrac{\partial g_1}{\partial x_1} & \cdots & \dfrac{\partial g_m}{\partial x_1}\\ \vdots & \ddots & \vdots\\ \dfrac{\partial g_1}{\partial x_n} & \cdots & \dfrac{\partial g_m}{\partial x_n} \end{bmatrix}\cdot \begin{bmatrix} g_1 \\ \vdots \\ g_m\end{bmatrix} = \begin{bmatrix} g_1\dfrac{\partial g_1}{\partial x_1} + \cdots + g_m\dfrac{\partial g_m}{\partial x_1}\\ \vdots \\ g_1\dfrac{\partial g_1}{\partial x_m} + \cdots + g_m\dfrac{\partial g_m}{\partial x_m}\end{bmatrix} \end{equation}

On the other hand, $i$-th row of $\nabla F$ is: \begin{equation} \dfrac{\partial F}{\partial x_i} = \frac{1}{2}\cdot\dfrac{\partial}{\partial x_i}(g_1^2 + \cdots + g_m^2) = g_1\dfrac{\partial g_1}{\partial x_i} + \cdots + g_m\dfrac{\partial g_m}{\partial x_i}, \end{equation}

because \begin{equation} F = \frac{1}{2}(g_1^2 + \cdots + g_m^2). \end{equation} and \begin{equation} \dfrac{\partial}{\partial x_i}(g_j^2) = 2g_j\dfrac{\partial g_j}{\partial x_i}. \end{equation}

Tonci
  • 148
  • Did you mean G and not nabla G in the first line? – YellowPillow Oct 22 '16 at 00:53
  • @YellowPillow: Yes, thanks! I've corrected it now. – Tonci Oct 22 '16 at 09:08
  • One more question is it just by chance that the objective function ${\nabla}\mathbf{F}$ can be represented in a closed form via: ${\nabla}\mathbf{F}(\mathbf{x^{(0)}}) = J_G(\mathbf{x}^{(0)})^TG(\mathbf{x}^{(0)})$ or is this a generic formula? – YellowPillow Oct 22 '16 at 15:55
  • It's a consequence of the fact $\nabla(\mathbf{A} \cdot \mathbf{B}) = \mathbf{J}^\top_\mathbf{A} \mathbf{B} + \mathbf{J}^\top_\mathbf{B} \mathbf{A}$, for $\mathbf{A} = G^\top$ and $\mathbf{B} = G$. You can see more here: https://en.wikipedia.org/wiki/Vector_calculus_identities – Tonci Oct 22 '16 at 16:36