7

I'd like to simulate 10% sequencing error using art_illumina. The simulator doesn't have a parameter that I can just give the 10%, but it has this:

-qs  --qShift   the amount to shift every first-read quality score by
-qs2 --qShift2  the amount to shift every second-read quality score by
                NOTE: For -qs/-qs2 option, a positive number will shift up 
                quality scores (the max is 93) that reduce substitution sequencing 
                errors and a negative number will shift down quality scores that 
                increase sequencing errors. If shifting scores by x, the error
                rate will be 1/(10^(x/10)) of the default profile.

Heng Li's wgsim has an option for base error rate. Can I do the same for art_illumina?

Q: Does the "error rate" in the description mean sequencing error? If I set both qShift and qShift2 to 10, 1/(10^(x/10)) == 0.1, where x is 10, does that mean 10 is the value I should give to the simulator?

terdon
  • 10,071
  • 5
  • 22
  • 48
SmallChess
  • 2,699
  • 3
  • 19
  • 35
  • "a positive number will shift up quality scores that reduce substitution sequencing errors" -- I suspect you want to do the opposite: "a negative number will shift down quality scores that increase sequencing errors" – gringer Jul 05 '17 at 08:44

1 Answers1

4

You need to specify an error profile to do this. However, you can shift quality scores of an existing profiles to get 10% error, which is easier than defining a complete profile.

For example, Illumina HiSeq 1000 has the error rate of 0.1% and you want 100 time more errors, so you should shift down the quality scores by 20 by setting both qShift and qShift2 to -20. The following code will simulate paired end reads with 10% error.

art_illumina -ss HS10 -i reference.fa -qs -20 -qs2 -20 -sam -p -l 100 -f 20 -m 200 -s 10 -o output