2

There is a lot of material explaining how to calculate the jacobian for the softmax backwards pass, but I find it confusing how to get to the actual errors from the jacobian. The obvious answer would be to sum up the either rows or the columns - it does not matter since the matrix is symmetrical, but analytically it seems the gradient comes out to $0$.

\begin{align} \frac{\partial h_i}{ \partial z_j} &= h_i (1 - h_j) &\text{when } i = j \\[10pt] \frac{\partial h_i}{ \partial z_j} &= - h_i h_j &\text{when } i \ne j \end{align}

Simplifying the case when $i=j$, I get: $h_j(1 - h_j)$

Simplifying the case when $i\ne j$, I get: $- h_j(\sum h - h_j) = - h_j(1 - h_j)$

So adding those two cases up, the gradient with respect to the input should always be $0$. That makes zero sense to me. Where am I going wrong?

AdamO
  • 62,637

1 Answers1

1

It really is supposed to come out like that. To get the error though, the jacobian is not supposed to be summed directly, but matrix multiplied by the resulting error vector.

  • What did you mean when toy say error vector? – harveyslash Jul 05 '18 at 20:41
  • I can't recall exactly as it has been a while. Having to branch on the indices in order to reason out the derivatives definitely does make analysis a lot harder. I did a little tutorial a while back on how to derive the softmax backward pass without those complications using just straightforward algebraic rewriting. Hopefully you will find it helpful. – Marko Grdinić Jul 06 '18 at 14:54