3

I have assembled poxvirus genome using Ray. The assembly is good. Out of several thousand contigs I got, I was able to get one scaffold using Contiguator tool, which is about 90% of my genome. I have well annotated reference sequence. If there any good tool for filling gaps and checking assembly quality?

Things I've tried so far:

  1. IMAGE from PAGIT. It asks for ace file which I guess is obtained from Newbler. So, I am not able to use this.

  2. gapfill_py script from broad: There are some gaps still missing.

I came to know that Consed is good. But, I have windows which is not compatible for Consed.

I checked my BAM file with Tablet. There are reads that cover almost entire gene. Are there any good tools or manual ways to finish my assembly so that I can get the full genome?

M__
  • 12,263
  • 5
  • 28
  • 47
L R Joshi
  • 719
  • 3
  • 11
  • Is gap filling for viruses in principle different than gap filling in pro/eukaryotes? I could make some suggestions but they are based on my experience in insect genomics. – Kamil S Jaron Apr 08 '19 at 10:11
  • I don't think it's too different. It's pretty straightforward actually. Virus that I am sequencing does not contain any introns and overlapping ORFs. It has distinct conding regions flanked by promoter sequence in the intergenic region. And I have well annotated reference sequence too. Your suggestion would be really helpful. – L R Joshi Apr 09 '19 at 23:44

1 Answers1

2

Assuming that gap filling is a domain-independent and works the same for viruses as for bacteria or eukarytes.

There are two real options I am aware of: Sealer a successor of GapFiller from ABySS package and GapCloser from SOAPdenovo2 package. According to Sealer paper it's better, but I am not sure if any indepented benchmarking have confirmed their conclusion. On my insect data GapCloser worked better than GapFiller (back then I have not tried to compare it to Sealer).

Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
  • Can you please tell what input files do Sealer and GapFiller take ? – L R Joshi Apr 10 '19 at 19:35
  • I believe that in both cases it will be a gapped assembly (contigs glued by NNNNNs into scaffolds) and the sequencing reads that are mapped on the assembly to gradually reconstruct what is in the gaps. – Kamil S Jaron Apr 12 '19 at 07:59