1

I hope someone can help me better understand how center variables can help address the problem of multicollinearity problem when the regression includes interactions with polynomial terms. My understanding is that the issue of multicollinearity will inflate the standard errors, and the precision of the estimates will be affected. I did some experiments to test this and found center variables could only affect the standard errors and p-values in the regression outputs but not results from F-tests and simple slope estimates. Could anyone help me answer why this happened?

Example:

> rm(list=ls())
> x1 <- rnorm(100)
> x2 <- rnorm(100)
> x3 <- rnorm(100)
> x4 <- rnorm(100)
> 
> y1 <- x1 + x2 + x2**2 + x1*x2 + x3 + x4 + rnorm(100)
> 
> # Uncentered
> fit1 <- lm(y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) + x3 + x4)
> fit10 <- lm(y1 ~ x1 + x2 + I(x2^2) + I(x2^3)  + x3 + x4)
> 
> summary(fit1)

Call: lm(formula = y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) + x3 + x4)

Residuals: Min 1Q Median 3Q Max -2.90014 -0.76847 0.09462 0.58068 2.10778

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13762 0.14161 0.972 0.333716
x1 1.15448 0.14366 8.036 3.43e-12 *** x2 1.24048 0.23804 5.211 1.18e-06 *** I(x2^2) 0.91891 0.12499 7.352 8.64e-11 *** I(x2^3) -0.06725 0.11720 -0.574 0.567522
x3 1.16358 0.11421 10.188 < 2e-16 *** x4 1.13058 0.11156 10.134 < 2e-16 *** x1:x2 0.96342 0.26942 3.576 0.000564 *** x1:I(x2^2) -0.15596 0.12277 -1.270 0.207240
x1:I(x2^3) 0.07668 0.12683 0.605 0.546967


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.048 on 90 degrees of freedom Multiple R-squared: 0.8243, Adjusted R-squared: 0.8068 F-statistic: 46.93 on 9 and 90 DF, p-value: < 2.2e-16

> > # Centered at mean > xc <- x2 - mean(x2) > summary(xc) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.82165 -0.63630 -0.03103 0.00000 0.53655 2.19972 > > fit2 <- lm(y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) + x3 + x4) > fit20 <- lm(y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x3 + x4) > > # F tests of joint significance of interaction terms > anova(fit1,fit10) Analysis of Variance Table

Model 1: y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) + x3 + x4 Model 2: y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x3 + x4 Res.Df RSS Df Sum of Sq F Pr(>F)
1 90 98.758
2 93 178.810 -3 -80.052 24.318 1.294e-11 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > anova(fit2,fit20) Analysis of Variance Table

Model 1: y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) + x3 + x4 Model 2: y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x3 + x4 Res.Df RSS Df Sum of Sq F Pr(>F)
1 90 98.758
2 93 178.810 -3 -80.052 24.318 1.294e-11 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > summary(fit2)

Call: lm(formula = y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) + x3 + x4)

Residuals: Min 1Q Median 3Q Max -2.90014 -0.76847 0.09462 0.58068 2.10778

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002391 0.140412 0.017 0.986450
x1 1.036765 0.149121 6.953 5.52e-10 *** xc 1.017549 0.235536 4.320 4.01e-05 *** I(xc^2) 0.943065 0.128953 7.313 1.04e-10 *** I(xc^3) -0.067254 0.117204 -0.574 0.567522
x3 1.163576 0.114212 10.188 < 2e-16 *** x4 1.130583 0.111565 10.134 < 2e-16 *** x1:xc 1.004059 0.268523 3.739 0.000324 *** x1:I(xc^2) -0.183500 0.123139 -1.490 0.139671
x1:I(xc^3) 0.076682 0.126831 0.605 0.546967


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.048 on 90 degrees of freedom Multiple R-squared: 0.8243, Adjusted R-squared: 0.8068 F-statistic: 46.93 on 9 and 90 DF, p-value: < 2.2e-16

> > # Simple slopes > library(interactions) > sim_slopes(fit1, pred = x1, modx = x2, john = F) SIMPLE SLOPES ANALYSIS

Slope of x1 when x2 = -1.0358786 (- 1 SD):

Est. S.E. t val. p


-0.10 0.22 -0.43 0.67

Slope of x1 when x2 = -0.1197279 (Mean):

Est. S.E. t val. p


1.04 0.15 6.95 0.00

Slope of x1 when x2 = 0.7964229 (+ 1 SD):

Est. S.E. t val. p


1.86 0.18 10.15 0.00

> sim_slopes(fit2, pred = x1, modx = xc, john = F) SIMPLE SLOPES ANALYSIS

Slope of x1 when xc = -9.161508e-01 (- 1 SD):

Est. S.E. t val. p


-0.10 0.22 -0.43 0.67

Slope of x1 when xc = -1.526557e-17 (Mean):

Est. S.E. t val. p


1.04 0.15 6.95 0.00

Slope of x1 when xc = 9.161508e-01 (+ 1 SD):

Est. S.E. t val. p


1.86 0.18 10.15 0.00

>

zjppdozen
  • 347

1 Answers1

2

center variables can help address the problem of multicollinearity problem when the regression includes interactions with polynomial terms.

That isn't quite correct. Centering predictor variables can overcome numerical stability problems in that situation, but (as you found) the fundamental model doesn't change otherwise.

What can change with centering is the apparent "main effect" coefficient for predictors involved in interactions. As those coefficients represent the associations of predictors with outcome when all other predictors are at their reference values, changing the reference value for one predictor (e.g., by centering) can change the apparent "main effect" coefficients for predictors with which it interacts. I present a simple worked-through example on this page. In your case, you can see that in the different coefficients for x1 between the two models, while those for x3 and x4 are identical.

Nevertheless, for any combination of predictor values in the original scale, models will provide the same predictions whether built on centered or uncentered values.

EdM
  • 92,183
  • 10
  • 92
  • 267