Here is a straightforward derivation of this approximate relationship for a discrete sample of returns that does not require specifying an underlying probability distribution or a continuous stochastic process (e.g., geometric Brownian motion).
Given a finite sequence of returns $r_1,r_2, \ldots, r_n $, the CAGR $g$ and arithmetic average return $\mu$ are
$$g = \left[\prod_{j=1}^n(1+r_j)\right]^{1/n} -1, \quad \mu = \frac{1}{n}\sum_{j=1}^n r_j$$
Rearranging the CAGR formula and taking logarithms, we get
$$\tag{1}\log(1+g) = \frac{1}{n}\sum_{j=1}^n \log(1+r_j)$$
By expanding $\log(1+r_j)$ around $\log(1+\mu)$ and substituting into (1) we can obtain the relationship between $g$ and $\mu$.
Note that
$$\tag{2}\log(1+r_j) - \log(1+\mu) = \log\left(\frac{1+r_j}{1+\mu}\right) = \log \left(1 + \frac{r_j-\mu}{1+\mu} \right)$$
The Taylor series $\log(1+\lambda_j) = \lambda_j - \lambda_j^2/2 + \lambda_j^3/3 \mp \ldots$ converges for $|\lambda_j| < 1$. Applying this to (2) with $\lambda_j = (r_j- \mu)/(1+\mu)$, we get
$$\tag{3}\log(1+r_j) = \log(1+\mu) + \lambda_j - \frac{\lambda_j^2}{2} + \mathcal{O}(\lambda_j^3)$$
Substituting into (1) with (3) yields
$$\tag{4}\log(1+g) = \log(1+\mu) + \frac{1}{n}\sum_{j=1}^n \lambda_j - \frac{1}{2n}\sum_{j=1}^n \lambda_j^2 + \frac{1}{3n}\mathcal{O}\left(\sum_{j=1}^n \lambda_j^3\right)
$$
The second term on the RHS must vanish since $\sum_{j=1}^n\lambda_j = \sum_{j=1}^n(r_j -\mu) = n\mu - n\mu$. The third term is just $\sigma^2/(2(1+\mu)^2)$ where the volatility estimator $\sigma$ is given by
$$\sigma^2:= \frac{1}{n}\sum_{j=1}^n (r_j- \mu)^2 $$
We can also assume that the error term $\frac{1}{3n}\mathcal{O}\left(\sum_{j=1}^n \lambda_j^3\right)$ can be neglected for typical (small) values of $r_j$.
Making substitutions and applying the exponential function to both sides of (5) it follows that
$$\tag{6}1 + g \approx (1+\mu)\exp \left(- \frac{\sigma^2}{2(1+\mu)^2}\right)$$
Finally, expanding the exponential function and using the approximation $\exp(-x) \approx 1-x$ we get
$$1+g \approx (1 + \mu)\left[1-\frac{\sigma^2}{2(1+\mu)^2}\right] =1 +\mu - \frac{\sigma^2}{2(1+\mu)}, $$
and, hence,
$$g \approx \mu - \frac{\sigma^2}{2(1+\mu)} \approx \mu - \frac{\sigma^2}{2},$$
where an additional approximation $1/(1+\mu) \approx 1$ is used.
For typical indices like the S&P 500, the error of this approximation is a few basis points.