7

The NCBI Genomes database has these dendrograms for (presumably) whole genome comparisons for certain species, e.g. Pseudonomas aeruginosa or Escherichia coli.

How were these comparisons done? Someone knows the source / paper?

Peter Menzel
  • 443
  • 4
  • 9

1 Answers1

1

I inquired into the details of the dendrograms after becoming frustrated with the lack of information. As with Ensembl, I'm sure that the folks at NCBI have a standardized pipeline that they run the sequences through to generate these dendrograms as no specific source deserving attribution was mentioned:

The tree is based on a pairwise, BLAST comparison of chromosome sequences (an assembly is used for genomes with 2 or more chromosomes). BLAST identity is normalized on sequence length. The distance map is generated by a Neighbor Joining algorithm, where the distance is the blast score.

Dr. Wayne Matten, NCBI

Kohl Kinning
  • 1,149
  • 6
  • 26
  • 1
    Thanks for inquiring. Too bad that their response doesn't contain more details or a link to the pipeline. Seems like they calculate the average PID of all alignments reported by blastn und use that as a distance measure. I wonder if they also consider the parts of each genome pair that are not covered by alignments. – Peter Menzel Oct 25 '18 at 17:47