A post "Fitting Polynomial Regression in R" used two ways to model the polynomial regression: (a) poly(..., ...); (b) I(...). Below is the example:
set.seed(20)
q <- seq(from=0, to=20, by=0.1)
y <- 500 + 0.4 * (q-10)^3
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise
# fitting polynomials
# two methods
model_a <- lm(noisy.y ~ poly(q,3))
model_b <- lm(noisy.y ~ q + I(q^2) + I(q^3))
# their summary are all the same except the coefficients
summary(model_a)
summary(model_b)
The post said that:
q,I(q^2)andI(q^3)will be correlated and correlated variables can cause problems. The use ofpoly()lets you avoid this by producing orthogonal polynomials, therefore I’m going to use the first option (i.e.,poly()).
I am confused that:
(1) Why does the q, I(q^2) and I(q^3) cause problems?
(2) According to summary(), these two models are all the same, except the Coefficients . Why the coefficients are different, while others are the same? Shouldn't they all different, or all the same?
I()you may create correlated variables. Correlated variables are bad in linear regression because they influence inference, especially if correlations are very high. – user2974951 Aug 13 '19 at 14:02