I am analyzing data from a quantitative polymerase chain reaction (qPCR) using R. After cleaning the raw data, it looks something like this:
> dput(x)
structure(list(Reporter = c("FAM", "FAM", "FAM", "FAM", "FAM",
"FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "VIC", "VIC",
"VIC", "VIC", "VIC", "VIC", "VIC", "VIC", "VIC", "VIC", "VIC",
"VIC"), Number = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), A = c(22.19,
22.24, 22.5, 22.54, 22.6, 22.59, 23.07, 23.46, 22.43, 22.74,
24.09, 23.91, 24.52, 25.03, 25.25, 25.82, 25.13, 24.71, 25.34,
25.85, 25.25, 26.15, 25.81, 25.29), B = c(21.72, 21.78, 22.86,
22.73, 19.88, 20.07, 21.06, 21.06, 20.96, 21.11, 19.46, 19.43,
24.75, 24.69, 25.64, 25.19, 23.76, 23.69, 24.35, 25.05, 24.1,
23.81, 22.81, 23.13), C = c(21.37, 21.56, 20.07, 20.01, 21.17,
21.08, 20.54, 20.36, 33, NA, NA, NA, 23.91, 24.31, 23.61, 23.88,
24.33, 24.31, 23.37, 23.53, 33, NA, NA, NA), E = c(26.26, 27.33,
25.93, 26.56, 25.76, 23.03, 24.72, 25.27, 24.43, 24.31, 26.98,
23.33, 24.04, 25.02, 25.1, 25.1, 24.68, 25.48, 25.87, 26.22,
25.35, 25.36, 25.11, 25.98), F = c(25.81, 26.9, 25.58, 26.61,
25.06, 21.85, 23.59, 24.04, 23.19, 23.19, 25.17, 20.8, 24.12,
24.26, 25.32, 25.25, 24, 23.78, 24.7, 24.48, 23.52, 23.87, 23.05,
23.05), G = c(26.12, 27.02, 24.08, 25.15, 25.99, 23.18, 24.2,
24.05, 33, NA, NA, NA, 23.47, 23.45, 23.7, 23.74, 24.46, 24.19,
23.56, 23.53, 33, NA, NA, NA)), .Names = c("Reporter", "Number",
"A", "B", "C", "E", "F", "G"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, 24L))
I think that each well measures two genes: target which is 12 different genes (color FAM), and the reference or housekeeping gene, GAPDH (color VIC). Also, control is triplicate A, B, C (3 x 12 wells), and treatment is E, F, G (3 x 12 wells).
I have created a function ddCt_ for analyzing the data with the ddCt algorithm (Livak & Schmittgen, 2001). It takes one argument x which is a data.frame of the form exemplified above.
ddCt_ <- function(x) {
# Subset x by control/treatment and target/reference
# Then, calculate Ct averages for each triplicate
TC <- x %>%
select(Reporter, Number, A, B, C) %>%
filter(Reporter == "FAM") %>%
rowwise() %>%
mutate(Ct_Avg = mean(c(A, B, C), na.rm = TRUE))
RC <- x %>%
select(Reporter, Number, A, B, C) %>%
filter(Reporter == "VIC") %>%
rowwise() %>%
mutate(Ct_Avg = mean(c(A, B, C), na.rm = TRUE))
TT <- x %>%
select(Reporter, Number, E, F, G) %>%
filter(Reporter == "FAM") %>%
rowwise() %>%
mutate(Ct_Avg = mean(c(E, F, G), na.rm = TRUE))
RT <- x %>%
select(Reporter, Number, E, F, G) %>%
filter(Reporter == "VIC") %>%
rowwise() %>%
mutate(Ct_Avg = mean(c(E, F, G), na.rm = TRUE))
# Normalize Ct of the target gene to the Ct of the reference gene
dCt_control <- TC$Ct_Avg - RC$Ct_Avg
dCt_treatment <- TT$Ct_Avg - RT$Ct_Avg
# Normalize dCt of the treatment group to the dCt of the control group
ddCt <- dCt_treatment - dCt_control
# Calculate fold change
fc <- 2^(-ddCt)
# Calculate avg and sd
dCt_control_avg <- mean(dCt_control)
dCt_control_sd <- sd(dCt_control)
dCt_treatment_avg <- mean(dCt_treatment)
dCt_treatment_sd <- sd(dCt_treatment)
# Create output
df <- data_frame(
Sample = 1:12, "dCt Control" = dCt_control, "dCt Treatment" = dCt_treatment,
"ddCt" = ddCt, "Fold Change" = fc, "dCt Control Avg" = dCt_control_avg,
"dCt Control SD" = dCt_control_sd, "dCt Treatment Avg" = dCt_treatment_avg,
"dCt Treatment SD" = dCt_treatment_sd
) %>% round(2)
write.csv(x = df, file = "result.csv")
saveRDS(object = df, file = "result.rds")
df
}
However, I am not a biochemist or molecular biologist, so I am unsure of the basic concepts involved and the overall approach. So, my questions are:
- Is the function correct?
- Why is it important to calculate fold change and standard deviation after normalization?