5

After the alignment step I checked the rnaseq metrics of all the samples. Among 40 samples three samples show high percentage of reads mapped to intronic regions. What could be the reason?

Samples   Exonic            Intronic    Intergenic
Sample1 545479 (12.8%)  2512309 (58.8%) 1217201 (28.5%)
Sample2 372836 (8.3%)   2556934 (56.8%) 1573032 (34%)
Sample3 529618 (7.3%)   3934615 (54.5%) 2750720 (38.1%)

I also see that 35-40k genes among 58k genes having "zero" read counts. Is this due to contamination? What could be the reason? What does reads mapping to exonic, intronic, intergenic tell?

Update: I used hisat2 for alignment. I used human genome "grch38_snp_tran" from hisat2 website. Libraries were generated using ribosome depletion kit.

stack_learner
  • 1,262
  • 14
  • 26

2 Answers2

3

As @DevonRyan mentioned, it's very likely that those samples were degraded, which is good justification for excluding them from subsequent analysis.

Daniel Standage
  • 5,080
  • 15
  • 50
2

I would expect at least 30% of reads from a total-cell, ribo-depleted RNA-seq to be exonic. Less suggests something when wrong.

As well as degradation, another explanation would be contamination with genomic DNA.

RNA obtained from the nuclear or cytoplasmic fraction of the cell might have a exonic content lower than 30%, while poly-A selected RNA would be expected to have higher, more like 66%.

Ian Sudbery
  • 3,311
  • 1
  • 11
  • 21