0

I am trying to estimate the treatment effect based on the above two model: $$Y(Z)=\beta_0+\tau Z+\varepsilon.$$ $$Y(Z)=\tau Z+\varepsilon.$$ Based on result from my data, I found the intercept is not significant, and the estimate of $\tau$ differs significantly.

I understand, typically we need to include the intercept term. However, the treatment $Z$ and the intercept can have a strong correlation. Therefore if we include the intercept, the variance of the estimate of $\tau$ will be very large. Can we exclude the intercept if it is not significant?

I generate a toy example as follows to illustrate the problem.


set.seed(1)
Z=rbinom(100,1,0.8)
X=rnorm(100)
Y=0.5*Z+X+rnorm(100,sd=1)

Then I fit the model with/without intercept.

> summary(lm(Y~Z+X))

Call: lm(formula = Y ~ Z + X)

Residuals: Min 1Q Median 3Q Max -2.84170 -0.66118 -0.03344 0.70559 2.40098

Coefficients: Estimate Std. Error t value (Intercept) 0.1953 0.2560 0.763 Z 0.2935 0.2812 1.044 X 1.0611 0.1129 9.395 Pr(>|t|)
(Intercept) 0.448
Z 0.299
X 2.72e-15 ***


Signif. codes:
0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.055 on 97 degrees of freedom Multiple R-squared: 0.4771, Adjusted R-squared: 0.4664 F-statistic: 44.26 on 2 and 97 DF, p-value: 2.195e-14

> summary(lm(Y~Z+X-1))

Call: lm(formula = Y ~ Z + X - 1)

Residuals: Min 1Q Median 3Q Max -2.83963 -0.66514 -0.03021 0.70907 2.39416

Coefficients: Estimate Std. Error t value Pr(>|t|)
Z 0.4889 0.1156 4.229 5.29e-05 *** X 1.0648 0.1126 9.457 1.82e-15 ***


Signif. codes:
0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.052 on 98 degrees of freedom Multiple R-squared: 0.5155, Adjusted R-squared: 0.5056 F-statistic: 52.13 on 2 and 98 DF, p-value: 3.809e-16

We can see the model with intercept does not estimate the effect of $Z$ well, and does not recognize the significancy of $Z$, but the model without intercept does. So, can we remove the intercept, or, what we can we do to fix the problem?

0 Answers0