I have a categorical variable that has $4$ categories, and I have two dummy variables, $x_1$ and $x_2$, that cover this categorical variable. The $x_1$ variable has values of only $1$ without any zeroes $[1,1,1,1,1...1]$, while $x_2$ has values of both $0$ and $1$ $[0,1,1,0,0,0...0]$. When I use multiple linear regression model with these two dummy variables as regressors, I get missing values at the output of summary(model) function in R (NA).
My questions are:
- Should I keep the regressor or remove it from the model?
- What will happen to the interpretations of the parameter estimates when I remove the intercept?
- Is there a situation where can we freely remove the intercept without affecting the interpretation of the parameter estimates?
- Why does R-squared increase when I remove the intercept?
This is the model with the intercept:
Call:
lm(formula = output ~ X1 + X2, data = data)
Residuals:
Min 1Q Median 3Q Max
-3.3091 -1.2591 -0.0091 1.2415 3.6909
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.3083 0.5329 19.345 <2e-16 ***
X1 NA NA NA NA
X2 -0.3992 0.6012 -0.664 0.509
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.846 on 54 degrees of freedom
Multiple R-squared: 0.008101, Adjusted R-squared: -0.01027
F-statistic: 0.441 on 1 and 54 DF, p-value: 0.5094
This is the model without the intercept:
Call:
lm(formula = output ~ X1 + X2 - 1, data = data)
Residuals:
Min 1Q Median 3Q Max
-3.3091 -1.2591 -0.0091 1.2415 3.6909
Coefficients:
Estimate Std. Error t value Pr(>|t|)
X1 10.3083 0.5329 19.345 <2e-16 ***
X2 -0.3992 0.6012 -0.664 0.509
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.846 on 54 degrees of freedom
Multiple R-squared: 0.9682, Adjusted R-squared: 0.967
F-statistic: 821.1 on 2 and 54 DF, p-value: < 2.2e-16
This is the model without the regressor X1:
Call:
lm(formula = output ~ X2, data = data)
Residuals:
Min 1Q Median 3Q Max
-3.3091 -1.2591 -0.0091 1.2415 3.6909
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.3083 0.5329 19.345 <2e-16 ***
X2 -0.3992 0.6012 -0.664 0.509
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.846 on 54 degrees of freedom
Multiple R-squared: 0.008101, Adjusted R-squared: -0.01027
F-statistic: 0.441 on 1 and 54 DF, p-value: 0.5094
