4

I was wondering what happens to bias and variance of GLM estimates as dimensionality approaches the number of training data points? Specifically in Linear Regression and Poisson Regression?

I know for Logistic Regression there is the Hauck-Donner phenomenon, which increases separation in the data, and increases the variance in the fitted regression coefficients. However I'm not familiar with the analogs of this phenomenon for Linear and Poisson Regression.

I've written some R code for Linear Regression:

genLinear = function(n,dimens){
beta = 1
xdata = replicate(dimens,rnorm(n))

ydata = apply(xdata,1,sum) + rnorm(n)

model = lm(ydata~.,data = data.frame(cbind(ydata,xdata)))

print(summary(model))
return(model)
}

and I've noticed there to be not a huge change when I increase the number of predictors to the number of data points. There does look to be some more variance in the regression coefficients, but not really any bias in the coefficients. Is this also true of Poisson Regression?

I thought this was a good discussion on the curse of dimensionality but the results shown in the figure seem contrary to what I'm seeing. My simulations show an increase in variance, but no bias at all. Whereas it seems as though the results in Elements of Statistical Learning show the opposite.

Michael
  • 2,461
  • 4
  • 26
  • 34
  • If GLM is just a linear model, doesn't this match what ESL says?

    Eq 2.28 gives: $E_{x_0} EPE(x_0) = \sigma^2(p/N) + \sigma^2$.

    So adding predictors increases the variance linearly, rather than exponentially in the case of nonparametric methods.

    – Graham Aug 26 '15 at 12:42

0 Answers0