Consider what matrix multiplication is, and observe the pattern of indices carefully:
$$D_{ij} = \sum_{k}W_{ik} X_{kj}$$
$$\frac{\partial D_{ij}}{\partial W_{ik}} = X_{kj}$$
For a previously described loss function $L$, by the chain rule,
$$\frac{\partial L}{\partial W_{ik}} =
\sum_j \frac{\partial L}{\partial D_{ij}} \frac{\partial D_{ij}}{\partial W_{ik}} =
\sum_j \frac{\partial L}{\partial D_{ij}} X_{kj} =
\sum_j \frac{\partial L}{\partial D_{ij}} X_{jk}^T
$$
Note $\partial D_{i'j}/\partial W_{ik} = 0$ for $i'\ne i$, so our chain rule sum is over the given $i$ ranging over $j$.
Since we used $X^T$, the inner index $j$ matches up for convenient matrix multiplication notation,
$$
\frac{\partial L}{\partial W} = \frac{\partial L}{\partial D} X^T
$$
This matrix of partial derivatives $\partial L / \partial W$ can also be implemented as the outer product of vectors: $(\partial L / \partial D) \otimes X$.
If you really understand the chain rule and are careful with your indexing, then you should be able to reason through every step of the gradient calculation.
We need to be careful which matrix calculus layout convention we use: here "denominator layout" is used where $\partial L / \partial W$ has the same shape as $W$ and $\partial L / \partial D$ is a column vector.