I am trying to figure out why the parameter $$\begin{equation*} \hat\beta = (X^TX)^{-1}X^TY \end{equation*}$$ is normally distributed in least-squares prediction. (Where Y is a linear function plus normal noise.) All the examples I've found have said that since $$\begin{align*} \hat\beta &= (X^TX)^{-1}X^TY \\ &= (X^TX)^{-1}X^T(X\beta + \varepsilon) \\ &= \beta + (X^TX)^{-1}X^T\varepsilon \end{align*}$$ we know that $$\hat\beta-\beta \sim \mathcal{N}(0,\sigma^2 (X^TX)^{-1})$$
I can see how the mean and variance are calculated, but why is this a normal distribution?