5

From SAM Optional Fields Specification the NM field is

Edit distance to the reference, including ambiguous bases but excluding clipping

Assuming both the MD and CIGAR are present, is the edit distance simply the number of characters [A-Z] appearing in the MD field plus the number of bases inserted (xI, if any) from the CIGAR string? Are there any other complications?

mattm
  • 754
  • 7
  • 19

1 Answers1

4

Assuming both the MD and CIGAR are present and correct, then yes, you can parse both to get the edit distance (NM auxiliary tag). One big caveat to this is that there's a reason that the samtools calmd command exists, since it's historically been the case that not all aligners have output correct MD strings. It's rare for the CIGAR string to be wrong and that'd be more of a catastrophic error on the part of an aligner. For what it's worth, if the NM auxiliary is absent on a given alignment but present on others produced by the same aligner then it's fair to assume NM:i:0 for a given alignment by default (many aligners only produce NM:i:XXX if the edit distance is at least 1).

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60