5

I've been running bwa mem -a for alignment, using the -a flag---this will

output all alignments for SE or unpaired PE

I've noticed in the SAM that there are several alignments with * in the SEQ and QUAL fields. Based on the documentation:

  1. SEQ: segment SEQuence. This field can be a ‘*’ when the sequence is not stored. If not a ‘*’, the length of the sequence must equal the sum of lengths of M/I/S/=/X operations in CIGAR. An ‘=’ denotes the base is identical to the reference base. No assumptions can be made on the letter cases.
  2. QUAL: ASCII of base QUALity plus 33 (same as the quality string in the Sanger FASTQ format). A base quality is the phred-scaled base error probability which equals −10 log10 Pr{base is wrong}. This field can be a ‘*’ when quality is not stored. If not a ‘*’, SEQ must not be a ‘*’ and the length of the quality string ought to equal the length of SEQ.

it appears the sequence isn't stored.

I would strongly prefer to have the sequence in this case. Is there any to direct bwa to output these sequences?

terdon
  • 10,071
  • 5
  • 22
  • 48
EB2127
  • 1,413
  • 2
  • 10
  • 23
  • Are you aware that many of these records are secondary/supplementary alignments? Does your workfow/model require/benefit from these alignments and handle them correctly? – Daniel Standage Dec 05 '18 at 15:23
  • 2
    @DanielStandage "Are you aware that many of these records are secondary/supplementary alignments?" Yes. " Does your workfow/model require/benefit from these alignments and handle them correctly?". Benefit, yes. – EB2127 Dec 05 '18 at 15:59

1 Answers1

5

I don't believe that is an option. Internally we created a patch and submitted it to BWA however that was 4 years ago and it has not been accepted.

Bioathlete
  • 2,574
  • 12
  • 29
  • Thanks. Did @user172818 ever give a reason why this feature isn't included? – EB2127 Dec 05 '18 at 03:29
  • 1
    see also https://www.biostars.org/p/352930/ – Pierre Dec 05 '18 at 09:26
  • @Pierre This is a great tool Pierre! I think it runs successfully for me. The problem is, I get errors downstream when using samtools view, i.e.. [E::sam_parse1] CIGAR and query sequence are of different length Are there work arounds? – EB2127 Dec 05 '18 at 19:13
  • I didn't test it with all the possible cases. Can you please try to isolate a minimal sam file containing the faulty reads ( all the reads with the same name) and send a bug report ? https://github.com/lindenb/jvarkit/issues/ – Pierre Dec 05 '18 at 19:54
  • @Pierre I may send you an e-mail instead. Thanks Pierre! – EB2127 Dec 07 '18 at 05:03
  • plindenbaum reverse(oohay) fr but I'll need a minimal failing SAM file. – Pierre Dec 07 '18 at 08:29