9

I have a set of RNA-seq samples from different experiments (Single and Paired End, depending on the experiment). I ran FASTQC in all the samples and found overrepresented adapter sequences:enter image description here

I removed the adaptors (TruSeq adaptors) using Cutadapt (in addition, I removed low quality and N bases from the 3' end of the reads). After that, I ran again FASTQC and the output is the following (representative example) : enter image description here

Does anyone know what is happening? Now I have an overrepresented sequence for which no sequence is provided. What does this mean?

plat
  • 1,032
  • 5
  • 15

1 Answers1

6

As @AaronBerlin mentioned, you didn't remove reads that were completely trimmed. Next time use the --minimum-length option and set it to something reasonable, like 20. Alternatively, use "Trim Galore!", which is a wrapper around cutadapt that has more reasonable defaults.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • 1
    Indeed this was the problem, I set the --minimum-length parameter to 20 and everything is fine. I didn't try Trim Galore! because I am working in a cluster and it is not installed at this moment but in the future I will try it as it seems more intuitive. – plat Aug 07 '17 at 11:08