How does center variable help address multicollinearity problems for interaction with polynomial terms

Question

I hope someone can help me better understand how center variables can help address the problem of multicollinearity problem when the regression includes interactions with polynomial terms. My understanding is that the issue of multicollinearity will inflate the standard errors, and the precision of the estimates will be affected. I did some experiments to test this and found center variables could only affect the standard errors and p-values in the regression outputs but not results from F-tests and simple slope estimates. Could anyone help me answer why this happened?

Example:

> rm(list=ls())
> x1 <- rnorm(100)
> x2 <- rnorm(100)
> x3 <- rnorm(100)
> x4 <- rnorm(100)
> 
> y1 <- x1 + x2 + x2**2 + x1*x2 + x3 + x4 + rnorm(100)
> 
> # Uncentered
> fit1 <- lm(y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) + x3 + x4)
> fit10 <- lm(y1 ~ x1 + x2 + I(x2^2) + I(x2^3)  + x3 + x4)
> 
> summary(fit1)
Call:
lm(formula = y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + 
    x1:I(x2^3) + x3 + x4)
Residuals:
     Min       1Q   Median       3Q      Max 
-2.90014 -0.76847  0.09462  0.58068  2.10778
Coefficients:
            Estimate Std. Error t value Pr(>|t|)

(Intercept)  0.13762    0.14161   0.972 0.333716

x1           1.15448    0.14366   8.036 3.43e-12 ***
x2           1.24048    0.23804   5.211 1.18e-06 ***
I(x2^2)      0.91891    0.12499   7.352 8.64e-11 ***
I(x2^3)     -0.06725    0.11720  -0.574 0.567522

x3           1.16358    0.11421  10.188  < 2e-16 ***
x4           1.13058    0.11156  10.134  < 2e-16 ***
x1:x2        0.96342    0.26942   3.576 0.000564 ***
x1:I(x2^2)  -0.15596    0.12277  -1.270 0.207240

x1:I(x2^3)   0.07668    0.12683   0.605 0.546967

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.048 on 90 degrees of freedom
Multiple R-squared:  0.8243,    Adjusted R-squared:  0.8068 
F-statistic: 46.93 on 9 and 90 DF,  p-value: < 2.2e-16
> 
> # Centered at mean
> xc <- x2 - mean(x2)
> summary(xc)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.82165 -0.63630 -0.03103  0.00000  0.53655  2.19972 
> 
> fit2 <- lm(y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) + x3 + x4)
> fit20 <- lm(y1 ~ x1 + xc + I(xc^2) + I(xc^3) +  x3 + x4)
> 
> # F tests of joint significance of interaction terms
> anova(fit1,fit10)
Analysis of Variance Table
Model 1: y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x1:x2 + x1:I(x2^2) + x1:I(x2^3) + 
    x3 + x4
Model 2: y1 ~ x1 + x2 + I(x2^2) + I(x2^3) + x3 + x4
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)

1     90  98.758

2     93 178.810 -3   -80.052 24.318 1.294e-11 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(fit2,fit20)
Analysis of Variance Table
Model 1: y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + x1:I(xc^3) + 
    x3 + x4
Model 2: y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x3 + x4
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)

1     90  98.758

2     93 178.810 -3   -80.052 24.318 1.294e-11 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> summary(fit2)
Call:
lm(formula = y1 ~ x1 + xc + I(xc^2) + I(xc^3) + x1:xc + x1:I(xc^2) + 
    x1:I(xc^3) + x3 + x4)
Residuals:
     Min       1Q   Median       3Q      Max 
-2.90014 -0.76847  0.09462  0.58068  2.10778
Coefficients:
             Estimate Std. Error t value Pr(>|t|)

(Intercept)  0.002391   0.140412   0.017 0.986450

x1           1.036765   0.149121   6.953 5.52e-10 ***
xc           1.017549   0.235536   4.320 4.01e-05 ***
I(xc^2)      0.943065   0.128953   7.313 1.04e-10 ***
I(xc^3)     -0.067254   0.117204  -0.574 0.567522

x3           1.163576   0.114212  10.188  < 2e-16 ***
x4           1.130583   0.111565  10.134  < 2e-16 ***
x1:xc        1.004059   0.268523   3.739 0.000324 ***
x1:I(xc^2)  -0.183500   0.123139  -1.490 0.139671

x1:I(xc^3)   0.076682   0.126831   0.605 0.546967

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.048 on 90 degrees of freedom
Multiple R-squared:  0.8243,    Adjusted R-squared:  0.8068 
F-statistic: 46.93 on 9 and 90 DF,  p-value: < 2.2e-16
> 
> # Simple slopes
> library(interactions)
> sim_slopes(fit1, pred = x1, modx = x2,  john = F)
SIMPLE SLOPES ANALYSIS
Slope of x1 when x2 = -1.0358786 (- 1 SD):
Est.   S.E.   t val.      p

-0.10   0.22    -0.43   0.67
Slope of x1 when x2 = -0.1197279 (Mean):
Est.   S.E.   t val.      p

1.04   0.15     6.95   0.00
Slope of x1 when x2 =  0.7964229 (+ 1 SD):
Est.   S.E.   t val.      p

1.86   0.18    10.15   0.00
> sim_slopes(fit2, pred = x1, modx = xc,  john = F)
SIMPLE SLOPES ANALYSIS
Slope of x1 when xc = -9.161508e-01 (- 1 SD):
Est.   S.E.   t val.      p

-0.10   0.22    -0.43   0.67
Slope of x1 when xc = -1.526557e-17 (Mean):
Est.   S.E.   t val.      p

1.04   0.15     6.95   0.00
Slope of x1 when xc =  9.161508e-01 (+ 1 SD):
Est.   S.E.   t val.      p

1.86   0.18    10.15   0.00
>

score 2 · Accepted Answer · answered Sep 22 '22 at 21:14

center variables can help address the problem of multicollinearity problem when the regression includes interactions with polynomial terms.

That isn't quite correct. Centering predictor variables can overcome numerical stability problems in that situation, but (as you found) the fundamental model doesn't change otherwise.

What can change with centering is the apparent "main effect" coefficient for predictors involved in interactions. As those coefficients represent the associations of predictors with outcome when all other predictors are at their reference values, changing the reference value for one predictor (e.g., by centering) can change the apparent "main effect" coefficients for predictors with which it interacts. I present a simple worked-through example on this page. In your case, you can see that in the different coefficients for x1 between the two models, while those for x3 and x4 are identical.

Nevertheless, for any combination of predictor values in the original scale, models will provide the same predictions whether built on centered or uncentered values.

How does center variable help address multicollinearity problems for interaction with polynomial terms

1 Answers1