1

Following are two linear regression models with the same predictors and response variable, but with different contrast coding methods. In the first model, the contrast coding method is "contr.treatment" and in the second model, it is "contr.sum".

options(contrasts = c("contr.treatment","contr.poly"))
model1<-lm(RISKINESS~1+GENDER + `SUBSTANCE USE`, data = df)
summary(model1)

Call: lm(formula = RISKINESS ~ 1 + GENDER + SUBSTANCE USE, data = df)

Residuals: Min 1Q Median 3Q Max -1.250 -1.000 -0.225 0.800 2.000

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.4500 1.1773 13.123 4.62e-08 *** GENDERM 3.7500 1.0461 3.585 0.004283 ** SUBSTANCE USEno_use -12.4500 1.3684 -9.098 1.88e-06 *** SUBSTANCE USEonce_per_month -6.4500 1.4545 -4.434 0.001004 ** SUBSTANCE USEonce_per_week -3.9500 0.8103 -4.875 0.000491 ***

options(contrasts = c("contr.sum","contr.poly"))
model2<-lm(RISKINESS~1+GENDER + `SUBSTANCE USE`, data = df)
summary(model2)

Call:
lm(formula = RISKINESS ~ 1 + GENDER + `SUBSTANCE USE`, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-1.250 -1.000 -0.225  0.800  2.000 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       11.6125     0.3421  33.945 1.74e-12 ***
GENDER1           -1.8750     0.5230  -3.585  0.00428 ** 
`SUBSTANCE USE`1   5.7125     0.7923   7.210 1.73e-05 ***
`SUBSTANCE USE`2  -6.7375     0.7366  -9.147 1.79e-06 ***
`SUBSTANCE USE`3  -0.7375     0.8150  -0.905  0.38489    

The results of the coefficient estimates, standard errors, and t-values are not the same.

How the results in the contrast sum calculated? Which method should be used (guidelines and references are well accepted)?

Ed9012
  • 311

0 Answers0