What's the drawback of using interaction terms to analyze the pre-post control data?

Question

I am trying to analyze the data with the pre-post-control design in the context of RNA-seq analysis.

I have read Best practice when analyzing pre-post treatment-control designs, but I am still confused because the method recommended by this manual(page 34) is not mentioned in the above post.

To briefly describe my data, the response is gene expression measurement and explanatory variables contain 3 factors(treatment, patient, and time): there are 2 levels(drug & control) in treatment, total 10 patients, and two levels(pre and post) in time.

The goal is to figure out if the drug has any effect on the gene expression while controlling for heterogeneity among patients.

The software manual suggests using interaction terms to control for individual effects, and use the following model:

$$expression \sim treatment + treatment:patient + treatment :time$$

and the code snippet below describes the idea:

require(dplyr)
dat <- structure(list(treatment = c("control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "drug", "drug", "drug", "drug", "drug", "drug", "drug", 
"drug", "drug", "drug"), time = c("pre", "post", "pre", "post", 
"pre", "post", "pre", "post", "pre", "post", "pre", "post", "pre", 
"post", "pre", "post", "pre", "post", "pre", "post"), patient = c(1, 
1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L), .Names = c("treatment", 
"time", "patient"))

# prepare for the design matrix 
dat <- dat %>% group_by(treatment) %>% 
              mutate(patient=paste0("p",rep(c(1:5),each=2))) %>%
              ungroup() %>%  
              mutate(time=factor(time, levels = c("pre","post")))

   treatment time patient
1    control  pre      p1
2    control post      p1
3    control  pre      p2
4    control post      p2
5    control  pre      p3
6    control post      p3
7    control  pre      p4
8    control post      p4
9    control  pre      p5
10   control post      p5
11      drug  pre      p1
12      drug post      p1
13      drug  pre      p2
14      drug post      p2
15      drug  pre      p3
16      drug post      p3
17      drug  pre      p4
18      drug post      p4
19      drug  pre      p5
20      drug post      p5

# construct a model matrix
mmat <- model.matrix(~treatment + treatment:patient + treatment:time,data = dat)

# terms that will be in my linear model
colnames(mmat)
 [1] "(Intercept)"                "treatmentdrug"             
 [3] "treatmentcontrol:patientp2" "treatmentdrug:patientp2"   
 [5] "treatmentcontrol:patientp3" "treatmentdrug:patientp3"   
 [7] "treatmentcontrol:patientp4" "treatmentdrug:patientp4"   
 [9] "treatmentcontrol:patientp5" "treatmentdrug:patientp5"   
[11] "treatmentcontrol:timepost"  "treatmentdrug:timepost"

To see the effect of the drug, the manual suggests check if the coefficient "treatmentdrug:timepost" is significant by computing log likelihood ratio between the model without that term and the model with that term and see if the nested model is sufficiently different from the full model.

Now my question is :

what's weakness of this approach of using interaction terms to control for patient effects? In other words, what assumptions am I breaking(if any)?
It seems that random effects models are used in this context, but what are the benefits?
If you were me, how would you have tackled this problem?

What's the drawback of using interaction terms to analyze the pre-post control data?

0 Answers0