2

I was wondering how would be the best way to report this model ?

### This model is from Winter (2019:139)
# Center SER: mutate(SER_c = SER - mean(SER, na.rm = T)
# Model:

lm(Iconicity ~ SER_c * POS)

term estimate intercept 0.66 SER_c 0.11 POSVerb 0.72 SER_c:POSVerb 0.50

Obs: SER_c is a centered continuos predictor and POS is a categorical predictor with 2 levels (Nouns and Verbs)

  • 1 I know that I cannot talk about main effects because B1 and B2 are simple effects, so would it be sth like:

"a regression model predicts an effect of SER (b = 0.11, p < 0.5) and POS (POS's beta is a composed beta between B1 + B3, so I dont know what I should report here?) as well as a significant interaction between the variables (b3 = 0.50, p < 0.05). Iconicity (the Y variable) is 0.66 for NOUNS w/ AVERAGE SER and 1.38 for VERBS w/ AVERAGE SER and the positive interaction indicates that Verbs are more iconic (the Y variable is iconicity) than Nouns for average SER words"

  • 2 Also, I can talk about the interaction effect as a moderator/control, cant I? Like this:

"we have controlled for SER in our model"

  • This is the paper in which Winter (the book's author) reports these results, but authors didn't include the interaction in the final version, so I'm a bit confused on how to report that

It is my first time reporting a centered lm with a significant interaction between a centered continuos variable and a categorical variable with two levels, I wanna get it right. I'd really appreciate some advice on that! Thanks in advance!

This post was helpful, but I couldn't make it to my example

1 Answers1

2

Centering (more generally, standardizing) predictors can make the regression coefficient harder to interpret. This seems to happen in your example because nouns and verbs have different mean SER.

I didn't know what SER is (and I still don't), so I was looking at (Winter 2017) to figure it out. In a great example of reproducibility, the data and the code for that analysis are available on GitHub.

This means I was able to reproduce your regression.

#>   (Intercept)          SER_c        POSVerb  SER_c:POSVerb  
#>        0.6642         0.1182         0.7237         0.5084

But first I wanted to look at the distribution of SER for nouns and verbs.

You can see from the histogram that verbs tend to have lower SER than nouns. We can also compute the average SER by part of speech (POS).

#> # A tibble: 2 × 3
#>   POS       n `mean(SER)`
#>   <chr> <int>       <dbl>
#> 1 Noun   1106        3.42
#> 2 Verb    373        2.96

So, while SER = 0 might be arbitrary, so may be the average SER of verbs and nouns (in this dataset). The (unweighted) sample mean SER is a 3:1 combination of the mean noun SER (3.42) and the mean verb SER (2.96):

(1106 * 3.42 + 373 * 2.96) / (1106 + 373) = 3.3

This average is hard to interpret. It follows that the concepts of "NOUNS w/ AVERAGE SER" and "VERBS w/ AVERAGE SER" are also hard to interpret.

Instead I would report the model as two regression lines, one for nouns and one for verbs. Here is the model overlaid on top of the data.

Appendix

Since POS is categorical, the model Iconicity ~ SER * POS is equivalent to a separate regression line for each category. (The errors about the regression lines have the same variance.) The interaction term is a specific parametrization which indicates that the regression slope is different for nouns and verbs.

$$ \begin{aligned} Y &= \beta_0 + \beta_1\text{SER} + \beta_2\operatorname{Is}\left\{\text{Verb}\right\} + \beta_3\text{SER} \times \operatorname{Is}\left\{\text{Verb}\right\} \\ &= \operatorname{Is}\left\{\text{Noun}\right\} \color{red}{\Big[\beta_0 + \beta_1\text{SER}\Big]} + \operatorname{Is}\left\{\text{Verb}\right\} \color{blue}{\Big[(\beta_0+\beta_2) + (\beta_1+\beta_3)\text{SER}\Big]} \\ &= \operatorname{Is}\left\{\text{Noun}\right\} \color{red}{\Big[\beta_0 + \beta_1\text{SER}\Big]} + \operatorname{Is}\left\{\text{Verb}\right\} \color{blue}{\Big[(\tilde{\beta}_2 + \tilde{\beta}_3\text{SER}\Big]} \\ \end{aligned} $$

B. Winter, M. Perlman, L. K. Perry, and G. Lupyan. Which words are most iconic?: Iconicity in english sensory words. Interaction Studies, 18(3):443–464, 2017.

dipetkov
  • 9,805
  • thank you very much (once more!). Would you help me on reporting it in a paragraph? I mean, this is the way I usually see in my field, so I'd have to do that (as I tried to do in the post). My difficulty was on how to report the slope for POS, since it is a combination of B1 and B3 and on how to report the interaction @dipetkov – Larissa Cury Sep 08 '22 at 19:15
  • Why don't you report the two slopes, one for nouns and one for verbs? The intercepts are less meaningful (if SER = 0 is arbitrary). – dipetkov Sep 08 '22 at 19:18
  • but if the slopes depend on the interaction, then how do I account for that. Let me give you an example, I usually see it like this (without an interaction) "continuous predictor X increases Y by 0.11 (b = 0.11, p < 0.05) when the category is category A and by nvalue when Y is category B (b = nvalue, p < 0.05)" But since X2 slope is B1 + B3, I don't know how to report that in a similar way? – Larissa Cury Sep 08 '22 at 19:22
  • Is it really interesting to report that the slope for verbs, which is 0.627, is statistically different from 0? – dipetkov Sep 08 '22 at 19:49
  • thank you very much! I'm using this book as a guideline to the mod I'm modeling myself on real data (which also has an categorical*continuos) int). So what I'd like to be able to report is the effect size of each predictor, but I've never reported an interaction like this before, that's why I'm a bit confused. Concerning the p value, I guess that's just a convention to report it. Unfortunately, in my field, there's still a lack of people using lm and lmers, so I don't have so many ex, this is the way I've seen some good materials report in my field, so I'm trying to follow that – Larissa Cury Sep 08 '22 at 19:55
  • The interaction is the difference between the two slopes. In this particular example, one doesn't really need a p-value to see the slopes are different. It can be often more intuitive to look at contrasts rather than regression coefficients. The coefficients depend on the parameterization: the same model can have many equivalent parametrizations. So I think it's harder to interpret the coefficients. – dipetkov Sep 08 '22 at 20:02
  • Here is how to use the contrast library to express the slope for verbs as a contrast (and to get its confidence interval and p-value): contrast::contrast(fit, list(SER = 1, POS = "Verb"), list(SER = 0, POS = "Verb")). Spoiler alert: the p-value is numerically 0. – dipetkov Sep 08 '22 at 20:04
  • thank you very much (once more!!)! This is indeed the model that I've been trying to fit (https://stats.stackexchange.com/questions/588224/lmer-violating-residuals-normality-assumption-what-should-i-do-when-enough-d) with this interaction and that I'm willing to report as soon as I come around the normality assumption – Larissa Cury Sep 09 '22 at 01:29