There is a lot of material explaining how to calculate the jacobian for the softmax backwards pass, but I find it confusing how to get to the actual errors from the jacobian. The obvious answer would be to sum up the either rows or the columns - it does not matter since the matrix is symmetrical, but analytically it seems the gradient comes out to $0$.
\begin{align} \frac{\partial h_i}{ \partial z_j} &= h_i (1 - h_j) &\text{when } i = j \\[10pt] \frac{\partial h_i}{ \partial z_j} &= - h_i h_j &\text{when } i \ne j \end{align}
Simplifying the case when $i=j$, I get: $h_j(1 - h_j)$
Simplifying the case when $i\ne j$, I get: $- h_j(\sum h - h_j) = - h_j(1 - h_j)$
So adding those two cases up, the gradient with respect to the input should always be $0$. That makes zero sense to me. Where am I going wrong?