Why does deriving the softmax for a single vector come to 0 for me?

Question

There is a lot of material explaining how to calculate the jacobian for the softmax backwards pass, but I find it confusing how to get to the actual errors from the jacobian. The obvious answer would be to sum up the either rows or the columns - it does not matter since the matrix is symmetrical, but analytically it seems the gradient comes out to $0$.

\begin{align} \frac{\partial h_i}{ \partial z_j} &= h_i (1 - h_j) &\text{when } i = j \\[10pt] \frac{\partial h_i}{ \partial z_j} &= - h_i h_j &\text{when } i \ne j \end{align}

Simplifying the case when $i=j$, I get: $h_j(1 - h_j)$

Simplifying the case when $i\ne j$, I get: $- h_j(\sum h - h_j) = - h_j(1 - h_j)$

So adding those two cases up, the gradient with respect to the input should always be $0$. That makes zero sense to me. Where am I going wrong?

score 1 · Answer 1 · answered Jan 30 '18 at 12:28

1

It really is supposed to come out like that. To get the error though, the jacobian is not supposed to be summed directly, but matrix multiplied by the resulting error vector.

answered Jan 30 '18 at 12:28

Marko Grdinić

131

What did you mean when toy say error vector? – harveyslash Jul 05 '18 at 20:41
I can't recall exactly as it has been a while. Having to branch on the indices in order to reason out the derivatives definitely does make analysis a lot harder. I did a little tutorial a while back on how to derive the softmax backward pass without those complications using just straightforward algebraic rewriting. Hopefully you will find it helpful. – Marko Grdinić Jul 06 '18 at 14:54

Why does deriving the softmax for a single vector come to 0 for me?

1 Answers1