Is It valid to use a Linear Mixed effect Model to quantify a group 'summary' value and plot it when the factors are all categorical?

Question

I currently have a dataset with two factors: Gene and Timepoint. Both of these factors are categorical in nature where Gene has 2 levels defined as control vs disease, and timepoint has 4 levels: Month 1, 2, 4, and 6. The justification for having categorical time instead of a continious variable is that the month can act as a proxy for disease state. For every subject, they have a specific number of technical replicates that can vary across subjects as well. The question I want to answer is whethe response is affected by Time and Gene: $\ H_0: \beta_{interaction} = 0$ Thus I have built a Linear Mixed Effect Model using fitlme in Matlab where the formula is:

$\ y_i \approx \beta_0 + \beta_1 \cdot Time + \beta_2 \cdot Gene + \beta_3 \cdot Gene * Time + \epsilon + Z_i$ where $Z_i$ is the intra subject variance (as subject is the random factor) and $\beta_{interaction} = \beta_3$. The reality of the equation is that they are dummy encoded, so there are more than 3 Betas, there's up to 7; one for each class. The beforementioned formula acts as a summary for what was intended to be asked mathematically.

How Matlab (and I'm assuming a bunch of other languages do) does is that it takes one class and assigns it to be the reference values, i.e. $\beta_0$ becomes the summary value for that particular class. All other class values are in reference to the intercept. A classic output in Matlab would look like:

The output in text form is as follows:

Linear mixed-effects model fit by ML

Model information:
Number of observations             736
Fixed effects coefficients           8
Random effects coefficients        207
Covariance parameters                2
Formula:
Strength ~ 1 + Timepoint*Genotype + (1 | Complete_ID)
Model fit statistics:
AIC       BIC       LogLikelihood    Deviance
2417.9    2463.9    -1198.9          2397.9
Fixed effects coefficients (95% CIs):
Name                                    Estimate    SE         tStat      DF     pValue         Lower       Upper

{'(Intercept)'                 }          5.6218    0.21269     26.431    728    1.8785e-108      5.2042     6.0393
{'Timepoint_Month4'            }         0.36013    0.28287     1.2731    728        0.20337     -0.1952    0.91547
{'Timepoint_Month1'            }         0.82175    0.28695     2.8638    728      0.0043066     0.25841     1.3851
{'Timepoint_Month2'            }         0.63884     0.2847     2.2439    728       0.025138    0.079909     1.1978
{'Genotype_WT'                 }         0.63632    0.28542     2.2294    728        0.02609    0.075977     1.1967
{'Timepoint_Month4:Genotype_WT'}        -0.55392     0.3972    -1.3946    728        0.16358     -1.3337    0.22588
{'Timepoint_Month1:Genotype_WT'}        -0.38871    0.40222    -0.9664    728        0.33417     -1.1784    0.40095
{'Timepoint_Month2:Genotype_WT'}        -0.54137    0.39186    -1.3815    728        0.16754     -1.3107    0.22794
Random effects covariance parameters (95% CIs):
Group: Complete_ID (207 Levels)
Name1                  Name2                  Type           Estimate    Lower      Upper

{'(Intercept)'}        {'(Intercept)'}        {'std'}        0.81521     0.70306    0.94525
Group: Error
Name               Estimate    Lower     Upper 
{'Res Std'}        1.0533      0.9916    1.1188

As one can see, there are a bunch of estimates, and my understanding is these are the estimates that describe the classes' effect on the response. For example, the class Timepoint_month2 has a 5.6218 + 0.6334 response value since 0.6334 is the quantified 'effect' of the class on the response if Timepoint_month2 was true, and 5.6218 is the intercept (when all $\beta_n = 0, n > 0$).

I have taken those estimates and the corresponding addition of each classes estimates to the intercept as the "representative" value for the response for each group. I have interpreted them in the same manner as how one would interpret the arithemtic mean for each class. So the questions are as follows:

Is the intepretation of the intercept and the estimates of coefficients for each class correct (addition of the estimates for each respective class to the intercept would produce the "representative" group value)?
Can I plot those values onto a barchart if they can be interpreted as valid "representative" values?
Would it be considered misleading if I were to plot the arithmetic mean and variance from the mentioned calculation instead of the coefficients and the SE associated with it calculated from the LME? The concern here is that I am running a statistical test on the estimated coefficients from the LME, not from the values quantified by arithemtic mean and then doing an ANOVA on those values.
A somewhat tangential question but my understanding of the pValues in the image designate the likelihood of type 1 Error of whether the estimated coefficients are not 0 (so in this example, the type 1 error probability that a class is different from intercept). How would I go about doing a post-hoc test assuming the interaction term was found to be significant (done by using the anova function on the model in Matlab)?

It's very hard to read the summary table, even with the full link to the image. Could you please edit the question to include that information in legible text format? Surrounding text with backticks should do the trick; see the instructions for "Code" input. I think there might be an issue with how Matlab is choosing the reference classes, but I can't quite make it out from the image. — EdM, Feb 28 '24 at 15:19
@EdM Thanks for the suggestion, I have done so, please let me know what you think! — Syuma, Feb 28 '24 at 18:20

EdM · Accepted Answer · 2024-02-29T10:51:36.320

You seem to have started on the right path, but there are some potential missteps to avoid. I'll assume that you have found that the assumptions underlying the model have been met adequately.

With respect to the first question

First, the Intercept is the estimated outcome when all predictors are at reference levels (evidently Month6 for Timepoint and the non-WT Genotype). So adding the Timepoint_Month2 coefficient to the intercept only gives you the estimate for the reference (non-WT) level of Geneotype at Month2.

Second, you have to continue that coefficient-addition process carefully as you proceed to predictions that involve the interaction coefficients. For example, to get the Timepoint_Month2 estimate for the non-reference value of Genotype (WT) you would start with the above sum to get the reference value at Month2, then add the coefficient for the non-reference level (Genotype_WT, for its difference at the reference Month6), and finally add the Timepoint_month2:Genotype_WT interaction term that estimates the further difference from what would be predicted based solely on the lower-level coefficients.

Third, any such estimate should be accompanied by a corresponding standard error or confidence interval. For that you need access to the coefficient covariance matrix, which is presumably produced as part of the model fit but isn't shown in your model summary. Then use the formula for the variance of a weighted sum of correlated variables to get the variance of whatever linear combination of coefficients that you wish.

With respect to the fourth question

There are a few potential problems when you say that you want to do post-hoc tests "assuming the interaction term was found to be significant."

First, when there are interactions the apparent "significance" of any coefficient depends on how its interacting predictors are coded. See this page, for example.

Second, you don't have a single interaction coefficient. You have 3. I assume that Matlab's anova function can give you an overall estimate of the "significance" of all the interaction terms together, but be warned that some ANOVA methods can give misleading results when data are imbalanced. See this page, for example.

Third, there's no need to limit yourself to a situation with a "significant" interaction for performing comparisons. If you have a particular comparison of interest, do it while including any interaction coefficient that's involved. Use the coefficient covariance matrix and the formula for the variance of a weighted sum of correlated variables, as above. That's the best description of your data, as lack of statistical significance does not necessarily mean lack of practical significance.

If you do several such comparisons, however, you need to take precautions against the multiple comparisons problem.

With respect to the second and third questions

Bar charts aren't often very useful. For this type of data, plots of estimated outcomes and their confidence intervals over time for each Genotype (at a random intercept value of 0, presumably what the fixed-effect coefficients represent) would be a good choice . As the linear model pools information across all time points and genotypes and individuals to get the underlying error estimates, display the point estimates and confidence intervals from the modeled values instead of from the individual Timepoint:Genotype combinations, which would be complicated here by the random-effect components.

In response to comment. The point estimates and standard errors for each combination of Timepoint and Genotype, based on the raw, unmodeled data, aren't the best description of your results if the assumptions underlying your model hold.

A linear regression model assumes that all observations share a common random error distribution around the modeled values, which is estimated after taking into account all the combinations of fixed and random effects. The reliability of each model estimate (which is presumably what you want to display) is based on that pooled information about the random error distribution, not on the vagaries of which particular sample of that error distribution was represented in any particular Timepoint and Genotype combination.

With a mixed model even the average values at each Timepoint and Genotype combination can be misleading. The random intercepts in your model are estimated by forcing a Gaussian distribution on the set of individual intercept estimates among the Complete_ID levels, each of which is like an extrapolation back to the reference situation (non-WT and Month6 here). Again, all the data are used to estimate the random effect. Using the raw averages thus doesn't represent your best estimate of the underlying process, as they depend on the vagaries of which individuals (and their random intercepts) happened to be included in each Timepoint and Genotype combination.

Final thoughts

I'm not sure whether there are tools available in Matlab to simplify the types of calculations above. R has a wide range of tools that simplify modeling and post-model calculations. For example, the emmeans package can take a mixed model (e.g., from the lmer() function of the lme4 package) and get the point estimates and confidence intervals for each of the Timepoint:Genotype combinations, with appropriate correction for multiple comparisons. The DHARMa package is particularly helpful for evaluating the quality of a mixed model fit, which can be trickier than evaluating an ordinary least squares model.

Thank you for the comprehensive answer! This was extremely helpful. I'm glad that my intuition about adding the coefficients was correct as confirmed by your response (I have added the interaction term coefficeint where necessary with the corresponding WT_Month6 estimate). However, I am still a little confused about the answer to my third question. You mentioned it would be better to plot the estimates and CIs from the modeled values instead from individual combinations, but don't those estimates and CIs come from the individual combinations? I was wondering if I could get more clarification! — Syuma, Feb 29 '24 at 00:21
@Syuma I added a couple of paragraphs about why the modeled values are generally considered a better description of your results, if the assumptions underlying the model hold. In brief, the model smooths out the effects of random sampling involved in getting the data for each Genotype and Timepoint combination, using all the data to estimate the residual error that helps determine the reliability of the point estimates. — EdM, Feb 29 '24 at 10:51

Is It valid to use a Linear Mixed effect Model to quantify a group 'summary' value and plot it when the factors are all categorical?

1 Answers1