1

I would like to extract all coefficients from the linear regression model. By default, the reference category coefficient is omitted, but I need it along with the other coefficients.

Consider the following data:

library(conjoint)

data(chocolate)

And this function:

Conjoint(x = cprof, y = cpref, z = clevn)

1 intercept 8,6849 2 milk -1,0891 3 walnut -0,7328 4 delicaties -0,9224 5 dark 2,7443 6 low -0,5709 7 average 0,1188 8 high 0,4521 9 paperback -0,0287 10 hardback 0,0287 11 light -0,1686 12 middle 0,1734 13 heavy -0,0048 14 little -0,6466 15 much 0,6466

Returns all coefficient values. I need to do this in a conventional regression model like lm or glm or whatever, also returning the p values, like this:

Coefficients:
            Estimate Std. Error t value             Pr(>|t|)    
(Intercept)  8.68487    0.12648  68.667 < 0.0000000000000002 ***
kind1       -1.08908    0.19815  -5.496         0.0000000462 ***
kind2       -0.73276    0.19815  -3.698             0.000226 ***
kind3       -0.92241    0.19815  -4.655         0.0000035497 ***
price1      -0.57088    0.15254  -3.743             0.000190 ***
price2       0.11877    0.17887   0.664             0.506777    
packing1    -0.02874    0.11440  -0.251             0.801714    
weight1     -0.16858    0.15254  -1.105             0.269272    
weight2      0.17337    0.17887   0.969             0.332575    
calorie1    -0.64655    0.11440  -5.652         0.0000000193 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I need to return kind4, price3, packing2, weight3, calorie2, in this table above (includes p values), but I haven't found any way to do that.

  • The coefficients would need to be together in this table.
neves
  • 73
  • 8
  • 2
    The reference category is subsumed in the intercept. Specifically, the intercept gives the fitted response if all predictors are at their reference level (and numerical predictors are zero). As such, there is no parameter coefficient for the reference level. – Stephan Kolassa Jul 23 '22 at 11:56
  • See https://stats.stackexchange.com/questions/582754/interpreting-categorical-variables-with-reference-level-in-linear-model/582887#582887 – kjetil b halvorsen Jul 23 '22 at 14:17

1 Answers1

1

It is unclear from your question how your model looks like exactly (are independent variables factors?). However, there are no coefficients missing in your output.

Starting from the observation that a regression with an "intercept only" simply yields the mean (as coefficient value), you can add a factor variable (aka "dummy") to the model to find "group effects" (as indicated by the factor).

df = data.frame(v1=c(0,0,0,0,0, 1,1,1,1,1), y=c(1,2,3,4,5, 10,11,12,13,14))

Check if "intercept only" is the mean of y

summary(lm(y~1,df)) mean(df$y)

In my example above, the overall mean is 7.5, now if we introduce a "dummy" to distinguish the first five numbers in vector $y$ and the last five (variable v1), we get:

summary(lm(y~.,df))

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0000 0.7071 4.243 0.00283 ** v1 9.0000 1.0000 9.000 1.85e-05 ***

However, the individual group means are 3 and 12:

mean(c(1,2,3,4,5))
mean(c(10,11,12,13,14))

In the regression model we find this by adding the intercept and the coefficient for v1: 3+9=12.

Why?

The model is ($\beta_0$ is the intercept and $\beta_1$ the coefficient belonging to v1):

$$\hat{y}=\beta_0 + \beta_1 x,$$ $$\hat{y}= 3+9x.$$

So when $x$ is "on" (=1), we have: $$\hat{y} = 3 + 9*1 = 12.$$

When $x$ is "off" (=0), we have: $$\hat{y} = 3 + 9*0 = 3.$$

Peter
  • 256