1

Currently cross-posted at https://stackoverflow.com/questions/63492814/interaction-not-significant-but-one-simple-effect-significant-linear-mixed-mod because I wasn't sure which site was more appropriate, but StackOverflow tends to get more traffic and responses. I will take suggestions on where to best post, with the hope of getting useful feedback.


Background: I have fit a linear mixed model using lmer() (lme4 package) in R with two binary categorical predictors as dummy variables. One (Intervention) is within-subjects, while the other (Sex) is between-. The model accounts for two levels of correlation with random effects (data structure and model code described below). The outcome is proportions, but they're very well-behaved - the mean is around 0.5, with a range of about 0.2 to 0.9, and they're very normally distributed. Subsequently, the residuals show assumptions (normality, equal variance) are met. Thus, I don't think what I'm observing is due to violating assumptions of a linear (mixed) model.

Issue: The following is true no matter what random effects structure I use (which I list below): In every case, the test statistic for the interaction term between the two binary categorical predictors is about 1.7 in magnitude, while that for one of the binary predictors is always about 2.8 (the test stat for the other is ~1.3). Although there is question about how to accurately calculate p-values for these types of models (and whether or not we even should - I'm aware of this discussion point), it is clear that no matter the degrees of freedom used, the interaction term would be not considered statistically significant (with, say, $\alpha$ = 0.05), while the one predictor would. Note here the estimate for the individual predictor is a simple effect, since it is binary and dummy-coded. I used emmeans() to look at all four possible simple effects, and there is only one that is statistically significant (that with the test statistic of about 2.8).

I cannot figure out how the interaction could lack significance, but one of four possible simple effects is significant. I could see if the test statistics/p-values were "borderline," making it a potential issue of power. However, here the ballpark p-value for the interaction term (test stat ~1.7) is about 0.09, while a rough p-value for the simple effect (test stat ~2.8) is about 0.007. It seems problematic to me that they could differ by a magnitude, and makes me concerned that I am inherently modeling the data incorrectly, although if so, I can't see where I am in error.

Data structure: Each subject has an observed proportion across six different images (out of 12 possible they could have been randomly assigned): Three images were viewed pre-intervention, and three were viewed post-intervention. Thus, there is potential correlation due to subject and image, so these are considered as random effects. Lastly, Intervention is within-subjects, while Sex is between-.

Here is a small dummy dataset (not actual data, where number of unique subjects is 59 (29 of one sex, 30 of the other)):

structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L), Image = c("B", "A", 
"G", "E", "C", "I", "C", "G", "L", "A", "D", "F", "E", "A", "K", 
"B", "C", "I", "D", "F", "H", "J", "L", "B", "D", "F", "A", "L", 
"C", "E", "J", "K", "F", "B", "A", "D"), Intervention = c("Pre", "Pre", "Pre", "Post", 
"Post", "Post", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Pre", 
"Pre", "Pre", "Post", "Post", "Post", "Pre", "Pre", "Pre", 
"Post", "Post", "Post", "Pre", "Pre", "Pre", "Post", "Post", "Post", 
"Pre", "Pre", "Pre", "Post", "Post", "Post"), Sex = c("Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male"), Prop = c(0.488277, 
0.236734, 0.41036, 0.745403, 0.464705, 0.625076, 0.5602122, 0.590909, 0.333266, 0.365954, 0.374941, 0.662141, 0.64877, 0.434947, 0.721343, 0.5288113, 0.782714, 
0.603777, 0.4480342, 0.629813, 0.347684, 0.41906, 0.553854, 0.639324, 0.389804, 0.49155, 0.355763, 0.695487, 0.537433, 0.650022, 0.54022, 0.58907, 0.666208, 
0.713883, 0.625882, 0.434924)), class = "data.frame", row.names = c(NA, -36L))

Candidate models considered, each with varying random effects:

Model 1 (gave convergence warning): Note the output is that from my actual data (not the dummy dataset given above):

largest_lmer <- lmer(Prop ~ factor(Sex)*factor(Intervention) +
                            (1 | Image) +
                            (1 + Intervention | Subject), 
                     data = data01)

coef(summary(largest_lmer))

Estimate Std. Error t value

(Intercept) 0.51415277 0.03503742 14.674389

factor(Sex)Male 0.04019813 0.03006458 1.337059

factor(Intervention)Pre 0.05123982 0.01830275 2.799569

factor(Sex)Male:factor(Intervention)Pre -0.04238911 0.02509809 -1.688938

install.packages("emmeans") library(emmeans)

largest_lmer_emm_Int <- emmeans(largest_lmer, ~ factor(Sex) | factor(Intervention)) pairs(largest_lmer_emm_Int)

Intervention = Post:

contrast estimate SE df t.ratio p.value

Female - Male -0.04020 0.0301 57.3 -1.336 0.1867

Intervention = Pre:

contrast estimate SE df t.ratio p.value

Female - Male 0.00219 0.0307 57.2 0.071 0.9434

Degrees-of-freedom method: kenward-roger

largest_lmer_emm_Sex <- emmeans(largest_lmer, ~ factor(Intervention) | factor(Sex)) pairs(largest_lmer_emm_Sex)

Sex = Female:

contrast estimate SE df t.ratio p.value

Post - Pre -0.05124 0.0184 56.5 -2.789 0.0072 This is the significant simple effect

Sex = Male:

contrast estimate SE df t.ratio p.value

Post - Pre -0.00885 0.0172 55.0 -0.515 0.6084

Degrees-of-freedom method: kenward-roger

Model 2: All output similar to that from Model 1 (not repeated here):

medium_lmer <- lmer(Prop ~ factor(Sex)*factor(Intervention) + 
                           (1 | Image) +
                           (1 | Subject) +
                           (1 | Intervention:Subject), 
                    data = data01)

Model 3: All output similar to that from Model 1 (not repeated here):

smallest_lmer <- lmer(Prop ~ factor(Sex)*factor(Intervention) + 
                             (1 | Image) +
                             (1 | Subject), 
                      data = data01)

As I mentioned, all of these candidate models gave roughly the test statistics noted above - they did not vary depending on the random effects included. Assumptions of the model (normality, equal variance) were met. Is there something else I'm missing? Or is it mathematically possible to have an insignificant interaction, but a significant simple effect that differ as much as these two do with regard to their test statistic/p-value?

Meg
  • 1,843
  • 4
  • 19
  • 31
  • 1
    As for where to post... I would say it is like looking for lost keys: look near where they should be, not where there is more light or foot traffic. Here you have lots of statisticians albeit less traffic. I suspect statisticians will know better how to handle your question. – kurtosis Aug 20 '20 at 15:58
  • Thanks, @kurtosis. I see a lot of statisticians on StackOverflow too, as the intersection between stats and coding is very blurred. – Meg Aug 20 '20 at 16:13
  • "the intersection between stats and coding is very blurred" Will have to agree to disagree. Guess it depends on how we define "statistician." :-) – kurtosis Aug 20 '20 at 16:30
  • Since even theoretical statistical papers generally require a simulation and/or application to a dataset, they are indeed blurred in my eyes. And in my job, I do plenty of theory and application, each requiring the other, so I cannot separate the two. – Meg Aug 20 '20 at 16:38

2 Answers2

1

I think there are a few potential issues here.

Your results tend to be the same using different random effects setups. That is not so surprising: Liang and Zeger talk about how approximate random effects models are often sufficient to get close to the truth and produce useful standard errors. The fixed effects should not change much if at all between the three models since they are the same in all three. This is the good part.

The troubling part is that you seem to insist that the interaction should be significant. Do you have some theoretical reason for that belief, or is it just a prior not based on theory? You don't want to be the analyst who tortures the data until it falsely confesses, so it really sounds like you need to be willing to accept that the interaction is insignificant. That should not be surprising: interactions are often less significant than the main effects.

Another possible issue is you may have a problem with heteroskedasticity. Proportions tend to be more variable when they are near 0.5 than when they are near 0 or 1. A typical correction for this is to transform the response to $\tilde{Y} = \sin^{-1}(\sqrt{Y})$ to stabilize the variance. That is a little bit of a pain because you need to transform back your predictions and the model coefficients are less intuitive, but the results will likely be cleaner. Weisberg's Applied Linear Regression, 2nd Ed. discusses this in Chapter 8.

Finally, you ask "is it mathematically possible to have an insignificant interaction, but a significant simple effect that differ as much as these two do with regard to their test statistic/$p$-value?" Absolutely. Suppose we gather school children from Smallville and Littletown, show some of them videos on word roots and guessing at spelling, and then give them all spelling tests. We might see that town is almost significant (say Smallville has better schools), the treatment is very significant, but that the interaction of town and treatment is not at all significant (i.e. both town's kids learn equally well from the video, so the interaction is immaterial). That would not even be unusual: I probably saw a hundred datasets like that in graduate school.

To summarize: I would be glad for your random effects modeling, transform your response, and be open to your interaction term not being significant. Don't torture the data; those confessions are rarely true. Good luck; hope it goes well!

kurtosis
  • 1,650
  • 1/4 Thanks, @kurtosis. Although in theory the random effects hopefully don’t change our results much, this is not always true. In a logistic mixed model of these data, the results are sensitive to which random effects are/are not included. There could be other issues with the model leading to this, but one cannot assume that any old random effects will do. As a matter of fact, there are many conflicting opinions in the literature on how to best choose random effects so as to balance the type I error rate and power. – Meg Aug 20 '20 at 16:09
  • 2/4 I am not insisting the interaction be significant. I originally stumbled upon the significant simple effect in light of the insignificant interaction because I have also considered a similar logistic model (using 0/1 data instead of the proportions), and wanted estimates of the odds ratios between all groups as measures of effect size to report anyway, despite lack of statistical significance. That is, until I saw one of the simple effects was significant. That’s what led to me questioning if/how/when this could happen. – Meg Aug 20 '20 at 16:10
  • 3/4 As I mentioned in my post, I had no evidence of heteroskedasticity in the residuals after I fit the model. For completeness, however, I had already also tried an empirical logit transformation on the proportions, and the results are equivalent. But, as I said, I don’t think a transformation is necessary here because the proportions are already so well-behaved (as in my post). – Meg Aug 20 '20 at 16:10
  • 4/4 I also think you’re misunderstanding main vs. simple effects. I understand that you can readily have a significant main effect without a significant interaction (and, indeed, the Sex main effect is significant here). What I’m asking about are the simple effects: All four combinations of Sex and Intervention (change from male to female when holding intervention at “pre,” e.g.). It makes less (no?) sense to me how one of these simple effects can be significant if the interaction is not. – Meg Aug 20 '20 at 16:10
  • Good to hear that a logit transform performed similarly; and, yes, the random effects sometimes make a difference. One of your simple effects can be significant if the others are incredibly noisy. Also, presumably one of those simple effects is aliased with your baseline, so that is likely an issue with looking at the simple effects. – kurtosis Aug 20 '20 at 16:26
  • Would I expect aliasing in a 2x2 ANOVA (fit as a linear mixed model, so random effects could be incorporated)? – Meg Aug 20 '20 at 16:46
  • Typo two comments above: "and, indeed, the Sex main effect is significant here" should be, "and, indeed, the Intervention main effect is significant here".
  • – Meg Aug 20 '20 at 16:58
  • Aliasing is a direct result of identifiability. It happens but how it happens depends on how your factors are coded, not on the size of groups and treatments. So yes, you should expect this in a 2x2 setup. Most discussions of contrasts in $R$ discuss how this happens. – kurtosis Aug 20 '20 at 17:04
  • My data are coded as dummy variables (0/1), which is the natural way to then estimate simple effects, and I have not seen an example where aliasing has caused an issue with doing so. Indeed, I have seen the 2x2 ANOVA case as a straightforward example for the sake of illustrating simple effects, so it's unclear why it would be an issue here. (Note I have only seen aliasing discussed with effects coding (say, -1/1).) I have spent some time now searching various terms, and have not found an indication that aliasing would be at play here. Do you have a source for the 2x2, dummy-coded case? – Meg Aug 20 '20 at 17:24
  • See here: https://bbolker.github.io/stat4c03/notes/contrasts.pdf Note that aliasing happens in your case; that is why you do not see an effect estimated for each of the simple effects. So Sex0:Intervention0, Sex1:Intervention0, and Sex0:Intervention1 would all not be reported and the model summary would instead report the intercept, Sex effect, and Intervention effect. Only Sex1:Intervention1 would not be aliased and reported as Sex1:Intervention1. – kurtosis Aug 20 '20 at 17:48
  • 1
    Oh, yes, I know that not all simple effects will be output, but they can subsequently be calculated (with algebra using 0s and 1s, or using something like emmeans). p-values can be obtained by refitting the model with different baselines, or, again, using emmeans. I thought by aliasing you meant some things may never be estimable, but all simple effects in this 2x2 situation should be estimable, just by using algebra/changing baseline/using emmeans. It is unclear to me if/how this is related to the lack of a significant interaction despite a significant simple effect. – Meg Aug 20 '20 at 18:00
  • 1
    Ah, no. Everything you have is identifiable. It's just what gets reported. That is how the significance gets affected: how something is coded often implies a basis for comparison. So coding as $\pm$1 would be comparing to 0 while the usual treatment contract compares to the baseline. I suspect your significant simple effect is (most of but not all of) what is driving the significant InterventionPre effect. – kurtosis Aug 20 '20 at 18:53
  • Thanks for clarifying everything is identifiable. And yes, I want to compare back to baseline so I can get simple effects here. And I agree - the one significant simple effect is probably driving the significant Intervention main effect. What I can't get over is this: I would have missed this simple effect altogether had I never "stumbled upon" it (again, when estimating ORs when treating the outcome as binary), because an insignificant interaction should generally indicate no need to go on to look for significant simple effects. So this goes back to my question: How is this happening? – Meg Aug 20 '20 at 19:10
  • I'm not sure why you think an insignificant interaction means no significant simple effects. It just means one of your simple effects (Sex1:Intervention1) is insignificant. Does that help? – kurtosis Aug 20 '20 at 19:24
  • 1/2 In theory, no significant interaction tells you that there is no need to differentiate between the four groupings - the main effect for Sex and/or Intervention (here, just Intervention) is "enough.” So, now you collapse (get rid of the interaction), and conclude that pre- and post-interventions differ, but not that – specifically - females post-intervention differ from females pre-intervention. – Meg Aug 20 '20 at 19:35
  • 2/2 You’ve lost this level of detail (i.e., the simple effect). The only way to get this detail is with the interaction in the model. However, when an interaction is not significant, it is generally argued there is no need to leave it in the model, and now your chance to capture that simple effect is gone. Is there something I’m missing? – Meg Aug 20 '20 at 19:42