I have on question regarding standardized coefficients (beta) in linear models. I have already asked one question here. From the answers I assume that I should use R's scale() function on the dependent variable as well as on all independent variables (IV), to estimate the standardized coefficients for the model. But when I used the scale() function on an IV, which belongs to the factor class I get following error message:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
To illustrate my problem here is a MWE:
First the linear model with unstandardized coefficients:
> data(ChickWeight)
> aa <- lm(weight ~ Time + Diet, data=ChickWeight)
> summary(aa)
Call:
lm(formula = weight ~ Time + Diet, data = ChickWeight)
Residuals:
Min 1Q Median 3Q Max
-136.851 -17.151 -2.595 15.033 141.816
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.9244 3.3607 3.251 0.00122 **
Time 8.7505 0.2218 39.451 < 2e-16 ***
Diet2 16.1661 4.0858 3.957 8.56e-05 ***
Diet3 36.4994 4.0858 8.933 < 2e-16 ***
Diet4 30.2335 4.1075 7.361 6.39e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 35.99 on 573 degrees of freedom
Multiple R-squared: 0.7453, Adjusted R-squared: 0.7435
F-statistic: 419.2 on 4 and 573 DF, p-value: < 2.2e-16
Now I want to estimate the standardized coefficients using the scale function, which results in following error message:
> bb <- lm(scale(weight) ~ scale(Time) + scale(Diet), data=ChickWeight)
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
As I figured out by myself the error message appears, because Diet belongs to the factor class and is not a numeric variable as required from the scale() function. I tried the following alternatively by including the Diet variable without scale():
> cc <- lm(scale(weight) ~ scale(Time) + Diet, data=ChickWeight)
> summary(cc)
Call:
lm(formula = scale(weight) ~ scale(Time) + Diet, data = ChickWeight)
Residuals:
Min 1Q Median 3Q Max
-1.92552 -0.24132 -0.03652 0.21151 1.99538
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.24069 0.03415 -7.048 5.25e-12 ***
scale(Time) 0.83210 0.02109 39.451 < 2e-16 ***
Diet2 0.22746 0.05749 3.957 8.56e-05 ***
Diet3 0.51356 0.05749 8.933 < 2e-16 ***
Diet4 0.42539 0.05779 7.361 6.39e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5064 on 573 degrees of freedom
Multiple R-squared: 0.7453, Adjusted R-squared: 0.7435
F-statistic: 419.2 on 4 and 573 DF, p-value: < 2.2e-16
My question now is, if this is the right way to estimate the standardized coefficients for a model with both numeric and factor variables?
Thank you very much in advance for an answer.
Regards,
Magnus
summary(bb)would deliver the same standardized coefficients aslm.beta(aa)from thelm.betapackage. So in the end usinglm.beta()would be a little bit more convenient. I double checked the results with SPSS, which estimates the same values for the standardized coefficients. – Magnus Metz Oct 24 '14 at 09:11