FASTQC overrepresented sequences after trimming

Question

I have a set of RNA-seq samples from different experiments (Single and Paired End, depending on the experiment). I ran FASTQC in all the samples and found overrepresented adapter sequences:

I removed the adaptors (TruSeq adaptors) using Cutadapt (in addition, I removed low quality and N bases from the 3' end of the reads). After that, I ran again FASTQC and the output is the following (representative example) :

Does anyone know what is happening? Now I have an overrepresented sequence for which no sequence is provided. What does this mean?

Looks like you trimmed your sequences to nothing. Do you have empty fastq entries in the file? — Bioathlete, Aug 04 '17 at 16:39

score 6 · Accepted Answer · answered Aug 04 '17 at 17:36

6

As @AaronBerlin mentioned, you didn't remove reads that were completely trimmed. Next time use the --minimum-length option and set it to something reasonable, like 20. Alternatively, use "Trim Galore!", which is a wrapper around cutadapt that has more reasonable defaults.

answered Aug 04 '17 at 17:36

Devon Ryan

19,602
2
29
60

1

Indeed this was the problem, I set the --minimum-length parameter to 20 and everything is fine. I didn't try Trim Galore! because I am working in a cluster and it is not installed at this moment but in the future I will try it as it seems more intuitive. – plat Aug 07 '17 at 11:08

FASTQC overrepresented sequences after trimming

1 Answers1