2

I saw in the SAM format specifications that the SEQ field (10th column) can be a "*" if the sequence is not stored, instead of being the sequence of the mapped read. Under what circumstances is this expected to occur? I don't understand why it's allowed in the spec.

For a concrete example, I have pacbio data mapped to a reference sequence, and some of the records in the SAM file have a sequence of *, and I don't know why or if that means anything. I don't understand why they exist, since the data is ostensibly present in the FASTQ file I use as input to the read mapper, and I don't understand what purpose they could have, since it seems to me that these records couldn't be interpreted by variant calling software. At best it would seem they can be used to infer coverage, but it's coverage by sequences of an uncertain nature?

Oakheart
  • 21
  • 1

0 Answers0