If you regress randomly generated independent variables on a randomly generated dependent variable, is the expected R squared value simply a function of n (the # observations) and k (the # of independent variables)? If so, why is this?
In some old regression course notes I was re-reading, I see that the expected R-squared in this case is k / (n-k-1). I tried this with some randomly generated data (e.g. n=100 and k=20) and indeed got a value very close to 0.2532, but I don't understand how it can be this simple. Thanks for any color anyone might have.
Ragree with this. Note that your formula yields nonsense whenever $k$ exceeds $n-k-1$: it gives an expected $R^2$ greater than $1$! – whuber Dec 23 '17 at 23:45