After reading some of the forum posts in Biostar and SeqAnswers I find it very confusing whether to filter out the duplicate reads from aligned files or not. As far I understand it's very difficult to distinguish between highly expressed genes and duplicate reads and we may lose important information during the filtration process.
So, is it really necessary to remove the duplicates in differential expression analysis using RNA-seq data?
I would probably try doing your analysis both with and without removal and then manually inspect some of the results only found by one approach to se which you find more trustworthy. What you should pay special attention to are whether the coverage is more uniform or seem to be affected by loads of reads from single positions.
– Kristoffer Vitting-Seerup Aug 17 '17 at 08:42