Ok, to begin with your last statement, I can't see the difficulty there. Yes, since $i = j$, then $S_i = S_j$. Therefore, $S_j(1 − S_i) = S_i − S_i^2 = S_j − S_j^2$.
If you are asking on why keeping the different $i$ and $j$ indices for the first case, when it'd be more clear to have just one, e.g. $S_i(1 − S_i)$, well... authors.
As to why we have these expressions, recall the definition of softmax:
$$S_i(\mathbf z) = \frac{e^{z_i}}{\sum_k{e^{z_k}}}$$
By taking the derivative with respect to the $j$-th entry of vector $\mathbf z$, we get:
$$\partial_jS_i(\mathbf z) = \frac{\sum_k{e^{z_k}}\times\partial_je^{z_i} - e^{z_i}\times\partial_j\sum_k{e^{z_k}}}{(\sum_k{e^{z_k}})^2}$$
Now, we have two cases. First, $i \neq j$:
$$\partial_jS_i(\mathbf z) = \frac{\sum_k{e^{z_k}}\times0 - e^{z_i}\times e^{z_j}}{(\sum_k{e^{z_k}})^2} = - \left(\frac{e^{z_i}}{\sum_k{e^{z_k}}}\right) \left(\frac{e^{z_j}}{\sum_k{e^{z_k}}}\right)$$
Which is equal to $-S_iS_j$.
For $i = j$, we get:
$$\partial_iS_i(\mathbf z) = \frac{\sum_k{e^{z_k}}\times e^{z_i} - e^{z_i}\times e^{z_i}}{(\sum_k{e^{z_k}})^2} = \left(\frac{e^{z_i}}{\sum_k{e^{z_k}}}\right) \left( \frac{\sum_k{e^{z_k}} - e^{z_i}}{\sum_k{e^{z_k}}} \right)$$
Which, after dividing and replacing, becomes $S_i(1 − S_i)$.