6

My work involves searching for marker genes/fragments in metagenomic databases (like the Sequence Read Archive). Once I find these sequences, I would like to know more about the neighboring genomic region.

Is there a way I could assemble only sequences that create a contig which contains my region of interest? Contigs which don't contain this region are not useful to me. My organism of interest might represent a minority of the metagenome, and assembling everything in the dataset would use a lot of computing power.

Laura
  • 909
  • 5
  • 11
  • Good question! I know that some of the assemblers for mtDNA are working on this principle - searching for reads mapping to a conserved mt gene and then extending the sequence, maybe you could tweak one of those tools. – Kamil S Jaron Apr 09 '19 at 11:57

1 Answers1

2

This is one of the primary use cases for which spacegraphcats (preprint and code) was designed. The "neighborhood queries" discussed in the paper sound particularly relevant. I don't have any personal experience with spacegraphcats, but the run guide provides some examples of how to index the complete data set and how to query sequences of interest.

Daniel Standage
  • 5,080
  • 15
  • 50
  • 2
    Yes, we'd be happy to chat (I'm one of the spacegraphcats authors). Just post questions over on our github repo! – Titus Brown Apr 09 '19 at 17:58
  • You might also be interested in citations 21 and 22 in the spacegraphcats paper - metacherchant and a paper that does something similar. – Titus Brown Apr 09 '19 at 17:58