Within each run, the experiment is set up as below:
- Genotypes refer to: WT (Wild type as blue), PKO (Partial Knockout in green), FKO (Full Knockout in red)
- Biological triplicates means the same genotype grown in 3 different flasks, for example WT grown in one flasks, WT grown in second flask, WT grown in third flask.
- Technical replicate means a single flask measured for 3 times.
- 3 runs were perform to ensure reproducibility
Run 1: 3 genotypes, biological triplicates each sampled once i.e. no technical replicates. In total, 9 datapoints (black dots).
Run 2: 3 genotypes, biological triplicates each sampled thrice i.e. technical triplicates for each biological replicate. In total, 27 datapoints (black dots).
Run 3: 3 genotypes, biological triplicates each sampled thrice i.e. technical triplicates for each biological replicate. In total, 27 datapoints (black dots).
Preprocessing of the dataset part 1: Finding the concentration of each compound in each sample via spiked deuterated standards (picograms/ mg of protein) Concentration = (Area of compund/Area of deuterated standards) x amount of deuterated standards / total amount of protein.
Preprocessing of the dataset part 2: was done in Metaboanalyst 5.0 link - Autoscaling was used, and you see that across the 3 runs - in the right panel titled “Normalized Conc” have similar median and mean (yellow diamond).
Problem Because deuterated standards used to normalize are not not authentic standards the original concentration vary greatly between run. But it is clear at the normalized concentration that there is a Biological difference between the WT and the PKO and FKO. The PKO and FKO are expected to be same because PKO has 80% of the gene knockout while FKO 100%.
Question 1 Is this batch effect? If yes, what can I do to remove the batch effects
Question 2 How can I simplify the presentation and combine the 3 runs only to show the biological difference? My plan to redo the visualization is as follows, but I am not sure if it is correct:
#1: The whole is to compare biological difference so I plan to combine the technical replicates as they belong to the sample.
#2: I am thinking to work with the “Normalized Conc.” because it seems obvious (easiest) to me but I am not sure what kind of statistics I can do to combine the normalized concentrations across the 3 runs.
I feel like I am missing some fundamentals in statistics to be able to process this hence any advice will be appreciated.




