t-test and regression with categorical predictors NOT matching

Question

As far as I'm concerned, the results from a t-test should be in line with the results with a regression model with a 2-level categorical variable. Why isn't this happening here? for more details

Here is the model:

mod1 <- lmer(CONT_Y ~ YEAR * MY_GROUP + (1|PARTICIPANTS), data = data)
Fixed effects:
                    Estimate Std. Error      df t value Pr(>|t|)

(Intercept)          17.6114     0.4026 75.9163  43.745   <2e-16 ***
YEARB                1.1438     0.5299 60.0000   2.159   0.0349 *

MY_GROUP2            0.9148     0.5299 60.0000   1.726   0.0894 .  ### THIS IS WHAT I'M LOOKING AT | NOT SIGNIFICANT (p > 0.05)
YEARB:GROUPL2       -0.6024     0.7493 60.0000  -0.804   0.4246

And this is the t-test:

df <- data %>% 
  filter(YEAR %in% "A") ### ISOLATING DIFFERENCES FOR YEAR "A" (the intercept above)
t.test(CONT_Y ~ MY_GROUP, data = df, paired = T)
data:  CONT_Y by GROUP
t = -2.2409, df = 20, p-value = 0.03654 ###################### SIGNIFICANT (p < 0.05)
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.76628432 -0.06323949
sample estimates:
mean of the differences 
             -0.9147619 ################# MY BETA (AS EXPECTED)

Question: Shouldn't the results be the same? I mean, shouldn't both be either significant or non-significant?

Edit: additive model:

mod2 <- lmer(MY_CONT ~  YEAR + GROUP_2 + (1|ID), data = data, REML = FALSE)
Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)

(Intercept)  17.7620     0.3488 69.8820  50.923   <2e-16 ***
YEARB        0.8426     0.3676 63.0000   2.292   0.0252 *

GROUP2       0.6136     0.3676 63.0000   1.669   0.1001    # Still different from the t-test

Here is my data:

data <- structure(list(PARTICIPANTS = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                        3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
                                        7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 
                                        10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 
                                        14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 
                                        17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 20L, 20L, 
                                        20L, 20L, 21L, 21L, 21L, 21L), CONT_Y = c(19.44, 20.07, 19.21, 
                                                                                  16.35, 11.37, 12.82, 19.42, 18.94, 19.59, 20.01, 19.7, 17.92, 
                                                                                  18.78, 19.21, 19.27, 18.46, 19.52, 20.02, 16.19, 19.97, 13.83, 
                                                                                  15.93, 14.79, 21.55, 18.8, 19.42, 19.27, 19.37, 17.14, 14.45, 
                                                                                  17.63, 20.01, 20.28, 17.93, 19.36, 20.15, 16.06, 17.04, 19.16, 
                                                                                  20.1, 16.44, 18.39, 18.01, 19.05, 18.04, 19.69, 19.61, 16.88, 
                                                                                  19.02, 20.42, 18.27, 18.43, 18.08, 17.1, 19.98, 19.43, 19.71, 
                                                                                  19.93, 20.11, 18.41, 20.31, 20.1, 20.38, 20.29, 13.6, 18.92, 
                                                                                  19.05, 19.13, 17.75, 19.15, 20.19, 18.3, 19.43, 19.8, 19.83, 
                                                                                  19.53, 16.14, 21.14, 17.37, 18.73, 16.51, 17.51, 17.06, 19.42
                                        ), CATEGORIES = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
                                                                    1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("A", 
                                                                                                                            "B"), class = "factor"), MY_GROUP = structure(c(1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                                                                                                                                                            1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L
                                                                                                                            ), .Label = c("G1", "G2"), class = "factor")), row.names = c(NA, 
                                                                                                                                                                                         -84L), class = c("tbl_df", "tbl", "data.frame"))
rename column:
data <- data %>%  rename(., YEAR = CATEGORIES)

Why are you doing this comparison? The LMM and the t-test make different assumptions (and you use different subsets of the data for each), so they don't need to agree about the significance. This suggests some misunderstanding about what significance tests are designed to tell us. — dipetkov, Jan 29 '23 at 13:23
@dipetkov , hi, we're doing that cuz' we're doing the t-test as a sort of 'post-hoc' to the model since we need to explore the relantionship between all variables included in the model, not only in comparision to the intercept. Meaning, I have CONT_Y YEAR A ~ MY_GROUP | CONT_Y YEAR B ~ MY_GROUP | MY GROUP G1 ~ YEAR | MY GROUP G2 ~ YEAR — Larissa Cury, Jan 29 '23 at 13:27
Have you considered looking at marginal effects? In R you can do these with the ggeffects package. One vignette is called Practical example: Logistic Mixed Effects Model with Interaction Term which sounds relevant here. — dipetkov, Jan 29 '23 at 13:36
I suggest to abandon t-tests and study how you can make comparisons between (sub)groups of interest based on the fitted regression. See also emmeans and its vignettes. (PS. Some of the functionality of ggeffects is built on emmeans.) — dipetkov, Jan 29 '23 at 13:42

score 0 · Accepted Answer · answered Jan 29 '23 at 12:43

0

In short, the test statistics differ because the lmm is estimating the SE from the entire data set and not just the YEAR = A subgroup. If you limit the lmm to the YEAR = A subgroup, you get the same result.

answered Jan 29 '23 at 12:43

JWalker

646
3
9

I thought about that. Then I performed an additive model. As far as I'm concerned, without the interaction, then, the results for MY_GROUP2 in relation to the intercept (B0 = My group 1) should be the same, right? Since both are in relation to YEAR = A, but I still got different results. I'll add this edit to the post asap. Funny thing now, tho, is that the betas do not match the difference in means anymore (0.91) – Larissa Cury Jan 29 '23 at 13:06
1

the coefficient for my_group from the factorial lmer model will equal the difference in means from the paired t-test if 1) yearA is the reference level AND 2) there is only 1 measure per participant per group:year combination AND 3) there are no missing values. This is the case here. This will not be true for the additive model because the coeficient of lmer is estimating a pooled mean. – JWalker Feb 02 '23 at 14:58
1

The test statistics for the factorial model will not be the same as those for the t-test because of what I said above -- the error variance is computed from all four group:year combinations in the lmer model but only from the group:yearA combinations in the t-test. – JWalker Feb 02 '23 at 14:58
thank you, hence, I can say that it differs because of b0 = b1 + b2.1 + ERROR . The difference is up to the error term soleny, then? – Larissa Cury Feb 06 '23 at 10:57

t-test and regression with categorical predictors NOT matching

rename column:

1 Answers1