I am interested in identifying indels in whole genome bisulfite sequencing data (76bp paired end). Currently, I do this by setting the -rfg and -rdg affine gap penalty scores for bowtie2 to more permissive values than the default 5+3N and mapping using the bisulfite sequencing alignment wrapper, bismark.
My question is, what values of -rfg and -rdg will allow me to identify longest possible indels without sacrificing alignment quality? Is it better to set the affine penalty to zero with a high penalty for initially opening an indel (ex. 8+0N)? Or is it better to keep the initial penalty low and having a nonzero penalty for extension (ex. 2+1N)?
--score-min,-rdg, and-rfgsettings, it appears that the maximum sized indel identifiable with my read length is 3bp. I changed both-rdgand-rfgto a gap open/extension of (1,1) and the maximum length I can detect is 14bp. I will try modifying--score-minonly. It's a fine line to walk, detecting the longest possible insertions while keeping the alignment clean! – Ben D. Aug 04 '17 at 03:38