In linear regression, we assume errors are normally distributed with mean 0 and variance $σ^2$.
We use MSE = (Residual sum of squares) / (n-p) as the estimator for σ, where n= number of observations and p = number of coefficients.
I think I understand the concept till here. However, when we come to standardized residuals, we say that the residuals are not a good estimator for the error variance, and calculate the variance of residuals as
var[$r_i$] = $σ^2(1 − h_{ii})$ where $h_{ii}$ is the relevant entry from the hat matrix
Therefore, $σ^2 = var[r_i]/(1-h_{ii})$
I am not able to reconcile this with the MSE formula. I believe this has some connection with adding all variances, and the trace of the hat matrix being equal to p, but I am not yet able to draw the connection clearly.