8

Low coverage MinION reads should be useful to close gaps and resolve repeats left by short-read assemblers. However, I haven't had any success with the software I know about. I'm aware of the following packages, either for scaffolding or closing gaps in short-read assemblies using long reads:

I've tried npScarf and LINKS, and they ran successfully but didn't resolve the gaps in my assembly. I couldn't get PBJelly and OPERA-LG to run with the current versions of BLASR and samtools, and both packages seem not to be maintained. I have not tried SSPACE-longread because it's not open source.

What software can I use to fill gaps and resolve repeats in a short-read assembly with low-coverage Nanopore data?


More information

I'm finishing a mitochondrial genome. I have a 17 kb short read assembly with one gap. This was made from 2x150b paired-end reads from a TruSeq PCR-free library with a 400 b insert size. Molecular data suggests the genome may be around 32 kb.

After mapping short reads against the short read assembly, Pilon identified a tandem repeat of 156 bp in an AT-rich region, but wasn't able to close the gap.

I have 1.4 Gb of MinION rapid reads with an L50 of 8808 b. I mapped these reads against the short read assembly, and I can see reads that span the gap. I seem to have the information required to close the gap, but I don't know how to do it.

The organism has a nuclear genome > 600 Mb, so long read coverage is low. Despite this, I tried a de novo Canu assembly of the Nanopore reads, and I pulled out a 39 kb contig containing mitochondrial genes. Most genes on this contig are duplicated and fragmented, and Pilon wasn't able to improve it.

Thanks for reading!

Tom Harrop
  • 203
  • 1
  • 7

1 Answers1

6

You might want to look into Unicycler (manuscript with more information can be found here); even though it is supposed to be used with bacterial genomes only, it might work well with a small genome such as a mitochondrion's.

enter image description here

Beware that if you happen to have very long reads, you might end up with an assembly with multiple copies of the circular genome: you might want to look into circlator then.

terdon
  • 10,071
  • 5
  • 22
  • 48
mgalardini
  • 977
  • 7
  • 18
  • Mitochondria are bacteria. Their long co-evolution as endosymbiotes may have had an influence in some of their genomic characteristics, though. – bli Jul 26 '17 at 11:36
  • Exactly; in fact, I'm not aware of any bacteria with a ~32kb genome :) – mgalardini Jul 26 '17 at 12:44
  • 1
    This worked well for me. I got a single 21 kb circular contig with all the expected mt genes. Thanks for alerting me to this software! – Tom Harrop Jul 31 '17 at 23:23