4

I am mapping kmers back to a few bacterial genomes using bwa fastmap:

bwa fastmap -l 9 ref.fasta kmers.fasta > out.fastmap                                           
[M::bwa_idx_load_from_disk] read 0 ALT contigs                                                                                                                                                                     
[main] Version: 0.7.17-r1188                                                                                                                                                                                       
[main] CMD: bwa fastmap -l 9 ref.fasta kmers.fasta                                                                                                                   
[main] Real time: 0.168 sec; CPU: 0.157 sec

The output looks like this:

SQ      AAAAGTAGTAAGCAGGAAGACAACACGGTTG 31
EM      0       11      1       contig1:+2041368
EM      3       13      2       contig1:+1875252        contig1:-2474744
EM      4       15      1       contig1:+3779779
EM      5       16      1       contig1:-3253348
EM      6       17      2       contig1:+938710 contig1:+1066682
EM      7       20      1       contig1:-4912797
EM      10      21      4       contig1:+4803473        contig1:-2830252        contig1:+4495150        contig1:-4907283
EM      11      23      1       contig1:-3148132
EM      13      24      1       contig1:-4651172
EM      14      25      3       contig1:-2142223        contig1:+4994474        contig1:+873066
EM      16      26      4       contig1:+156775 contig1:+27749  contig1:+2207492        contig1:-1340811
EM      17      29      1       contig1:+3523989
EM      19      30      24      *

EM      20      31      3       contig1:-2533354        contig1:-208660 contig1:-1080177
//

My question is: what is the meaning of the * char in the row before last? Why is bwa fastmap not reporting all the hits as in the other lines? I could not find a page that explains this output anywhere.

mgalardini
  • 977
  • 7
  • 18

2 Answers2

3

It means the read is unmapped. No origin in the reference could be found that was sufficiently similar to the read to call it a proper alignment.

In the SAM specification, * is always used if the information for that field is not available. Therefore, if no mapping information is available (=unmapped) one sets *.

terdon
  • 10,071
  • 5
  • 22
  • 48
3

Just starting to work with bwa fastmap and indeed the documentation is very sparse... This is what we get from running the program without arguments:

bwa fastmap

Usage:   bwa fastmap [options] <idxbase> <in.fq>

Options: -l INT    min SMEM length to output [17]
         -w INT    max interval size to find coordiantes [20]
         -i INT    min SMEM interval size [1]
         -L INT    max MEM length [2147483647]
         -I INT    stop if MEM is longer than -l with a size less than INT [0]

Which is a bit cryptic as is...

The output does not really follow SAM specification and I'm pretty sure the '*' means does not mean unmapped (in fact an unmapped Exact Match doesn't make any sense does it?)

In fact, the '*' means that there are too many matches to print the reference id and position. Looking quickly into the source code this is controlled by the -w arguments.

It defaults to 20, which is why you don't get an output for your 24 matches.