3

What is the standard way to measure contig sequence lengths in a BAM?

My understanding is that the community would use samtools idxstats to compute this information from the corresponding index file.

Are there more precise/standard approaches?

EB2127
  • 1,413
  • 2
  • 10
  • 23

1 Answers1

4

Standard approaches would be either samtools idxstats or samtools view -H, both of which will produce the exact same results. In fact, all methods will produce the exact same result, since contig length is set by the reference you align against.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • 1
    Specifically, reference length is not "computed", it is stored. In theory an aligner could store anything in the contig length header records, but generally its the length, in characters, of the FASTA entry for the reference from the genome the reads were aligned to. – Ian Sudbery Feb 06 '18 at 14:49
  • @IanSudbery I see your point. Finding the length of characters in the FASTA reference is somewhat of a "computation" though, right? Would "recorded" be a better way to describe it? – EB2127 Feb 06 '18 at 17:53