4

I have a bam file that I want to run through GATK's SplitNCigarReads tool. Because of the way the bam file was generated, the program will often fail, with an error message stating:

##### ERROR MESSAGE: Bad input: Cannot split this read (might be an empty section between Ns, for example 1N1D1N): ABCDEFG

My question is: how do I write a bash script to:

  1. Start GATK on the input bam file.

  2. When (and if) the error message arises, take the bad read (e.g., ABCDEFG) and use samtools to remove it,

    samtools view -h input.bam | grep -v ABCDEFG | samtools view -bS -o intermediate1.bam
    
  3. Index the new bam file

  4. Run the same GATK command as #1, but with the newly generated bam file

  5. Keep performing steps 1-4 until the GATK tool runs to completion

I realize this may be quite complicated and/or involved, so I don't expect someone to write the entire script for me. Are there any online resources dedicated to writing a script like this?

P.S. This is my first StackExchange question, please let me know if I did anything incorrectly or inefficiently.

Konrad Rudolph
  • 4,845
  • 14
  • 45
kylep
  • 41
  • 2
  • Just wondering if you have an example BAM that produces the error that you could share with us? Alternatively, could you describe how you are generating your BAM files so that we can produce a file that will produce one or more errors when run? – Steve Sep 16 '22 at 04:35
  • 3
    Removing offending reads one by one and rerunning the tool every time will be very time-consuming, and strikes me as fundamentally the wrong approach: due to how BAM files are structured, removing individual reads is incredibly inefficient. Your time would be better spent devising a way to detect and remove the offending reads all at once. – Konrad Rudolph Sep 16 '22 at 13:46
  • may refer to page 217: https://books.google.com.hk/books?id=JUp3EAAAQBAJ&pg=PA214&lpg=PA214&dq=bash+script+%22splitncigarreads%22&source=bl&ots=qf2xcKQL7Q&sig=ACfU3U1EQhIsOiGEnBndigXHEjnBkFX4aQ&hl=en&sa=X&ved=2ahUKEwi5t6XJyp36AhWQZd4KHaDUBDEQ6AF6BAgfEAM#v=onepage&q=bash%20script%20%22splitncigarreads%22&f=false – envs_h_gang_5 Sep 18 '22 at 05:13

0 Answers0