1

I performed a two-way ANOVA and I am trying to understand how the mean of each group can be found from the coefficients of intercept from each variable. From my understanding, the base group is partner_status=high, fcategory=high.

import statsmodels.api as sm
from statsmodels.formula.api import ols
moore = sm.datasets.get_rdataset("Moore", "carData", cache=True) # load
data = moore.data
data = data.rename(columns={"partner.status" : "partner_status"}) moore_lm = ols('conformity ~ fcategory+partner_status+fcategory*partner_status',data=data).fit()
print(moore_lm.summary())

The intercept coefficients give the mean of the group partner_status=high, fcategory=high which is 11.85. However for the variable fcategory[T.low]:partner_status[T.low], the intercept is -9.26. But the mean of that group is 8.9 (from the groupby results). How do we get the mean of that group from the coefficients?

                                              coef    std err          t      P>|t|      [0.025      0.975]
Intercept                                    11.8571      1.731      6.851      0.000       8.356      15.358
fcategory[T.low]                              5.5429      2.681      2.067      0.045       0.120      10.966
fcategory[T.medium]                           2.4156      2.214      1.091      0.282      -2.063       6.894
partner_status[T.low]                         0.7679      2.370      0.324      0.748      -4.026       5.561
fcategory[T.low]:partner_status[T.low]       -9.2679      3.451     -2.686      0.011     -16.247      -2.288
fcategory[T.medium]:partner_status[T.low]    -7.7906      3.573     -2.181      0.035     -15.017      -0.564

Below is the group results of the mean for each category.

        conformity
partner_status  fcategory   
high    high    11.857143
        low     17.400000
        medium  14.272727
low     high    12.625000
        low     8.900000
        medium  7.250000

1 Answers1

1

Your interpretation of the model Intercept is correct, but it's unwise also to call the other coefficients "intercepts." In this type of data coding, coefficients at progressively higher levels represent differences from what might be predicted by the lower-level coefficients, starting from the Intercept. Interaction coefficients are differences from what you might predict based on the lower-level coefficients.

So the prediction for variable fcategory[T.low]:partner_status[T.low] is Intercept + fcategory[T.low] + partner_status[T.low] + fcategory[T.low]:partner_status[T.low], or 11.8571 + 5.5429 + 0.7679 -9.2679, which equals 8.90.

Be warned that such ANOVA predictions based on modeled coefficients are only guaranteed to equal the means within each cell when there are equal numbers of observations in each cell. That's often not the case in observational studies.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • although not part of the question - if p-value is below 0.05 can the coefficients be used to say whether the variable will have a positivee or negative effect on the conformity and to what extend?? – pranav nerurkar Jan 27 '23 at 14:43
  • 1
    @pranavnerurkar with interactions there is no single "effect" of a predictor variable. Its association with outcome depends on the values of the variables with which it is interacting. Changing the reference level of a categorical variable, or centering/scaling a continuous variable, can change the magnitude and "significance" of the coefficients for the other variables it interacts with. See this page for an example. With interactions it's best to display model predictions (with error estimates) for representative combinations of predictors. – EdM Jan 27 '23 at 14:48
  • Thanks for the info. Can I say from the results given in the question that if for a data sample we have fcategory[T.medium]:partner_status[T.low] the result on the conformity will be reduction by -7.7906?? – pranav nerurkar Jan 27 '23 at 14:54
  • @pranavnerurkar that's the extra reduction below what you get from the sum of the Intercept with the individual fcategory[T.medium] and partner_status[T.low] coefficients. – EdM Jan 27 '23 at 15:51