PCA plot shows big difference but not many differentially expressed genes are found

Question

I got a PCA plot of bulk RNA-seq experiment that looks the following way:

It was generated by the following code:

pcaData <- plotPCA(rld_sva, intgroup=c("Group"), returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=Group)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed()

First sva correction was run to correct for batch effects and then rlog transformed values were plugged in to plotPCA function.

The first issue that catches the eye is that 100% of the variance is explained by just 2 dimensions. I am not sure what can one say about the data in this case. The second issue is that I get only around 20 differentially expressed genes by using DESeq2 analysis (log2FoldChange > 1, p_adj < 0.05). I know that we can not directly state that if there is a large difference on PCA there will be present a plenty of differentially expressed genes, but why is it not the case? Simple logic tells me that pca shows the difference between the samples in their gene expression, so I would expect seeing a plenty of differentially expressed genes.

Here each point is a sample or a cell? As you mention explaining 100% of variance in two dimensions would mean that an error happened somewhere. What have you done to find the root of this problem? And what is your question? If it is possible, if it is normal to happen something like this or something else? — llrs, Dec 18 '18 at 22:18
Sample. The question is why so much difference in pca does not lead to many differentially expressed genes. Second, minor question, is why the data is 2D, possible reasons for that. Sorry, I made a mistake, it is bulk RNA seq — Nikita Vlasenko, Dec 19 '18 at 00:10
Your two IgG replicates are separated a lot in the PCA plot, compared with the two other replicates, which likely means that there is too much disagreement within the IgG group in order for DESeq to arrive at low enough P-values for most of the differentially expressed genes. Adding one more IgG replicate would probably help a lot. — Peter Menzel, Dec 19 '18 at 10:27

score 7 · Accepted Answer · answered Dec 19 '18 at 03:58

You only have 4 samples total. I think it would be difficult to not have the PCA show big differences between the groups with so few points.

On the other hand, for differential expression, it is hard to get something to be statistically significant with only 2 replicates.

PCA plot shows big difference but not many differentially expressed genes are found

1 Answers1