In the context of ordinary least square regression with observed responses $Y$ and residual errors $\hat \varepsilon$, the $R^2$ coefficient is defined as $ R^2 = 1-\frac{\|\hat \varepsilon\|^2}{\|Y - \bar Y\|^2}. $ This is non-decreasing in the addition of covariates to the model, and hence the adjusted $R^2$ coefficient defined by $ R^2_a = 1-\frac{\|\hat \varepsilon\|^2/(n-1)}{\|Y - \bar Y\|^2/(n-p-1)} $ is often prefered.
Why is this particular adjustment of $R^2$ the most commonly used? What is the underlying justification?
Notes
- I'm mostly interested in historical and rigorous references. I haven't found a satisfactory answer on wikipedia or in the regression books I consulted.
- I'm not looking for the one explanation to end all explanations; my goal is to get a good grasp on different understandings of $R^2_a$.