I currently have a dataset with two factors: Gene and Timepoint. Both of these factors are categorical in nature where Gene has 2 levels defined as control vs disease, and timepoint has 4 levels: Month 1, 2, 4, and 6. The justification for having categorical time instead of a continious variable is that the month can act as a proxy for disease state. For every subject, they have a specific number of technical replicates that can vary across subjects as well. The question I want to answer is whethe response is affected by Time and Gene: $\ H_0: \beta_{interaction} = 0$ Thus I have built a Linear Mixed Effect Model using fitlme in Matlab where the formula is:
$\ y_i \approx \beta_0 + \beta_1 \cdot Time + \beta_2 \cdot Gene + \beta_3 \cdot Gene * Time + \epsilon + Z_i$ where $Z_i$ is the intra subject variance (as subject is the random factor) and $\beta_{interaction} = \beta_3$. The reality of the equation is that they are dummy encoded, so there are more than 3 Betas, there's up to 7; one for each class. The beforementioned formula acts as a summary for what was intended to be asked mathematically.
How Matlab (and I'm assuming a bunch of other languages do) does is that it takes one class and assigns it to be the reference values, i.e. $\beta_0$ becomes the summary value for that particular class. All other class values are in reference to the intercept. A classic output in Matlab would look like:
The output in text form is as follows:
Linear mixed-effects model fit by ML
Model information:
Number of observations 736
Fixed effects coefficients 8
Random effects coefficients 207
Covariance parameters 2
Formula:
Strength ~ 1 + Timepoint*Genotype + (1 | Complete_ID)
Model fit statistics:
AIC BIC LogLikelihood Deviance
2417.9 2463.9 -1198.9 2397.9
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)' } 5.6218 0.21269 26.431 728 1.8785e-108 5.2042 6.0393
{'Timepoint_Month4' } 0.36013 0.28287 1.2731 728 0.20337 -0.1952 0.91547
{'Timepoint_Month1' } 0.82175 0.28695 2.8638 728 0.0043066 0.25841 1.3851
{'Timepoint_Month2' } 0.63884 0.2847 2.2439 728 0.025138 0.079909 1.1978
{'Genotype_WT' } 0.63632 0.28542 2.2294 728 0.02609 0.075977 1.1967
{'Timepoint_Month4:Genotype_WT'} -0.55392 0.3972 -1.3946 728 0.16358 -1.3337 0.22588
{'Timepoint_Month1:Genotype_WT'} -0.38871 0.40222 -0.9664 728 0.33417 -1.1784 0.40095
{'Timepoint_Month2:Genotype_WT'} -0.54137 0.39186 -1.3815 728 0.16754 -1.3107 0.22794
Random effects covariance parameters (95% CIs):
Group: Complete_ID (207 Levels)
Name1 Name2 Type Estimate Lower Upper
{'(Intercept)'} {'(Intercept)'} {'std'} 0.81521 0.70306 0.94525
Group: Error
Name Estimate Lower Upper
{'Res Std'} 1.0533 0.9916 1.1188
As one can see, there are a bunch of estimates, and my understanding is these are the estimates that describe the classes' effect on the response. For example, the class Timepoint_month2 has a 5.6218 + 0.6334 response value since 0.6334 is the quantified 'effect' of the class on the response if Timepoint_month2 was true, and 5.6218 is the intercept (when all $\beta_n = 0, n > 0$).
I have taken those estimates and the corresponding addition of each classes estimates to the intercept as the "representative" value for the response for each group. I have interpreted them in the same manner as how one would interpret the arithemtic mean for each class. So the questions are as follows:
- Is the intepretation of the intercept and the estimates of coefficients for each class correct (addition of the estimates for each respective class to the intercept would produce the "representative" group value)?
- Can I plot those values onto a barchart if they can be interpreted as valid "representative" values?
- Would it be considered misleading if I were to plot the arithmetic mean and variance from the mentioned calculation instead of the coefficients and the SE associated with it calculated from the LME? The concern here is that I am running a statistical test on the estimated coefficients from the LME, not from the values quantified by arithemtic mean and then doing an ANOVA on those values.
- A somewhat tangential question but my understanding of the pValues in the image designate the likelihood of type 1 Error of whether the estimated coefficients are not 0 (so in this example, the type 1 error probability that a class is different from intercept). How would I go about doing a post-hoc test assuming the interaction term was found to be significant (done by using the anova function on the model in Matlab)?
