6

We are considering attempting de novo assembly of a species transcriptomes (i.e. without a reference genome) using the combined NGS outputs of Iso-seq and Illumina.

One example I saw (Li et al 2017), used the standard PacBio tools to assemble a transcriptome, followed by correcting the assembled sequence using a tool called proovread, followed by cd-hit-est and then cogent.

Would this be considered a reasonable pipeline or is there an alterative recommendation?

M__
  • 12,263
  • 5
  • 28
  • 47
Ian Sudbery
  • 3,311
  • 1
  • 11
  • 21

1 Answers1

1

My recommendation would be to run Trinity using the --long_reads option, which allows you to provide an error-corrected fastq file for anchoring reads to a transcript isoform. The reads are then clustered per-isoform, and assembled, in a similar fashion to what's done with genome-guided Trinity:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity

Trinity --seqType fq --max_memory 50G --long_reads corrected_pacbio_reads.fa --left reads_1.fq  --right reads_2.fq --CPU 6

Disclaimer: I have contributed code to Trinity at some time in the distant past

gringer
  • 14,012
  • 5
  • 23
  • 79