I hope someone can help me better understand how center variables can help address the problem of multicollinearity problem when the regression includes interactions with polynomial terms. My understanding is that the issue of multicollinearity will inflate the standard errors, and the precision of the estimates will be affected. I did some experiments to test this and found center variables could only affect the standard errors and p-values in the regression outputs but not results from F-tests and simple slope estimates. Could anyone help me answer why this happened?
Example:
> rm(list=ls())
> x1 <- rnorm(100)
> x2 <- rnorm(100)
> x3 <- rnorm(100)
> x4 <- rnorm(100)
>
> y1 <- x1 + x2 + x2**2 + x1*x2 + x3 + x4 + rnorm(100)
>
> # Uncentered
> fit1 <- lm(y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) + x3 + x4)
> fit10 <- lm(y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x3 + x4)
>
> summary(fit1)
Call:
lm(formula = y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) +
x1:I(x2^3) + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-2.90014 -0.76847 0.09462 0.58068 2.10778
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13762 0.14161 0.972 0.333716
x1 1.15448 0.14366 8.036 3.43e-12 ***
x2 1.24048 0.23804 5.211 1.18e-06 ***
I(x2^2) 0.91891 0.12499 7.352 8.64e-11 ***
I(x2^3) -0.06725 0.11720 -0.574 0.567522
x3 1.16358 0.11421 10.188 < 2e-16 ***
x4 1.13058 0.11156 10.134 < 2e-16 ***
x1:x2 0.96342 0.26942 3.576 0.000564 ***
x1:I(x2^2) -0.15596 0.12277 -1.270 0.207240
x1:I(x2^3) 0.07668 0.12683 0.605 0.546967
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.048 on 90 degrees of freedom
Multiple R-squared: 0.8243, Adjusted R-squared: 0.8068
F-statistic: 46.93 on 9 and 90 DF, p-value: < 2.2e-16
>
> # Centered at mean
> xc <- x2 - mean(x2)
> summary(xc)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.82165 -0.63630 -0.03103 0.00000 0.53655 2.19972
>
> fit2 <- lm(y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) + x3 + x4)
> fit20 <- lm(y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x3 + x4)
>
> # F tests of joint significance of interaction terms
> anova(fit1,fit10)
Analysis of Variance Table
Model 1: y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) +
x3 + x4
Model 2: y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x3 + x4
Res.Df RSS Df Sum of Sq F Pr(>F)
1 90 98.758
2 93 178.810 -3 -80.052 24.318 1.294e-11 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(fit2,fit20)
Analysis of Variance Table
Model 1: y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) +
x3 + x4
Model 2: y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x3 + x4
Res.Df RSS Df Sum of Sq F Pr(>F)
1 90 98.758
2 93 178.810 -3 -80.052 24.318 1.294e-11 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> summary(fit2)
Call:
lm(formula = y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) +
x1:I(xc^3) + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-2.90014 -0.76847 0.09462 0.58068 2.10778
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002391 0.140412 0.017 0.986450
x1 1.036765 0.149121 6.953 5.52e-10 ***
xc 1.017549 0.235536 4.320 4.01e-05 ***
I(xc^2) 0.943065 0.128953 7.313 1.04e-10 ***
I(xc^3) -0.067254 0.117204 -0.574 0.567522
x3 1.163576 0.114212 10.188 < 2e-16 ***
x4 1.130583 0.111565 10.134 < 2e-16 ***
x1:xc 1.004059 0.268523 3.739 0.000324 ***
x1:I(xc^2) -0.183500 0.123139 -1.490 0.139671
x1:I(xc^3) 0.076682 0.126831 0.605 0.546967
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.048 on 90 degrees of freedom
Multiple R-squared: 0.8243, Adjusted R-squared: 0.8068
F-statistic: 46.93 on 9 and 90 DF, p-value: < 2.2e-16
>
> # Simple slopes
> library(interactions)
> sim_slopes(fit1, pred = x1, modx = x2, john = F)
SIMPLE SLOPES ANALYSIS
Slope of x1 when x2 = -1.0358786 (- 1 SD):
Est. S.E. t val. p
-0.10 0.22 -0.43 0.67
Slope of x1 when x2 = -0.1197279 (Mean):
Est. S.E. t val. p
1.04 0.15 6.95 0.00
Slope of x1 when x2 = 0.7964229 (+ 1 SD):
Est. S.E. t val. p
1.86 0.18 10.15 0.00
> sim_slopes(fit2, pred = x1, modx = xc, john = F)
SIMPLE SLOPES ANALYSIS
Slope of x1 when xc = -9.161508e-01 (- 1 SD):
Est. S.E. t val. p
-0.10 0.22 -0.43 0.67
Slope of x1 when xc = -1.526557e-17 (Mean):
Est. S.E. t val. p
1.04 0.15 6.95 0.00
Slope of x1 when xc = 9.161508e-01 (+ 1 SD):
Est. S.E. t val. p
1.86 0.18 10.15 0.00
>