Let's say I have an RNA-Seq experiment, where I'm interested in the significantly differentiated genes between pre-treatment and post-treatment conditions. "rep" == biological replicate.
Sample PreA (3 reps **all** in 2017) vs Sample PostA (3 reps **all** in 2018)
Sample PreB (3 reps **all** in 2017) vs Sample PostB (3 reps **all** in 2018)
Sample PreC (3 reps **all** in 2017) vs Sample PostC (3 reps **all** in 2018)
I also have a sample:
Sample PreD (3 reps all in 2017) and Sample PreD (3 reps again in 2018)
It is safe to assume the samples in sequenced in 2017 were done differently to 2018.
Q1: Is there problem with batch effects in my study? When I do a test, there should't be any way to split variation from batch effects and biological variation?
Q2: If Q1 is correct. How should I reduce batch effects?
- Can I add a batch factor to my model in DESeq2/edgeR?
- Should I use my
PreDcontrol samples to derive normalisation factor as in RUV? (assume PreD as a negative control) - Should I use
svapackage?
EDIT
Devon suggested for adding a batch variable, but how? That's how I'm doing (ignoring batch effects):
dds <- DESeqDataSetFromTximport(txi, data, ~Sample)
data holds a data frame with a Sample column. Sample would be "PreA" and "PostA" in the first test. Very simple.
I couldn't add Date to it, like:
dds <- DESeqDataSetFromTximport(txi, data, ~Date + Sample)
because the code crashed complaining of the 2017/2018 problem discussed above. I'm not sure how I should add PreD to the design.
as.factor(c("2017", "2018")). – SmallChess Nov 28 '18 at 09:28A:Postinteraction. If you phrase it and think of it that way the model matrix is easier to construct (see my update to get you started). – Devon Ryan Nov 28 '18 at 09:32