Suppose you have some dataset $X \in \mathbb{R}^{n \times d}, y \in \mathbb{R}^{n}$ from some distribution $\mathcal{D}$ (if convenient to obtain a closed form formula, $\mathcal{D}$ can be a Gaussian distribution or something else that is tractable).
You posit a linear model, and you compute the least squares estimate $\hat{\beta} = (X^T X)^{-1} X^T y.$
Can you say anything about the MSE of $\hat{\beta}$? Of course, I imagine that for a fixed $X$ this is difficult, but maybe you could compute the expectation of the the MSE of $\hat{\beta}$, where the randomness in this expectation is over the dataset $(X,y)$?
This feels like a standard enough problem so that I wanted to first ask if anyone knows about this or has any helpful references. Thanks!