6

So I'm basically so new to this that I'm just trying to find out what tools, methods, and keywords I should go look up by myself.

I have a unique strain of a bacteria.

I was given RNAseq data for this unique strain.

I want to analyze the RNAseq data, but I need an annotated genome to do that.

As a resource, I do have the annotated genome of the common lab strain.

The common strain should be pretty similar to the unique strain, but I don't know how similar.

Can anyone tell me what tools/methods might be applicable so I can go look them up?

myflow
  • 63
  • 3

1 Answers1

5

For a quick (but reliable) analysis, I'd recommend using Kallisto or Salmon to quantify isoform read counts using the transcriptome of the lab strain.

If you have a concern about transcripts that are in your sample but not in the lab strain, you can do two Trinity assemblies: one fully de-novo, and one genome-guided. These transcripts can then be used as a reference assembly for Kallisto/Salmon, and counts compared to the lab-strain results. Note that the Trinity assemblies will only apply to the particular environments that are in your samples; if a transcript isn't expressed, it won't be in the assembly.

Any transcripts in the de-novo assembly that are not in the genome-guided assembly are potentially novel to your strain. However, care must be taken in interpreting this: if there is any other contamination in the sample, those contaminant transcripts will also be assembled.

I'm not too familiar with annotation pipelines, but based on a couple of Twitter posts it looks like Prokka might be a reasonable start. Torsten Seeman (lead developer of Prokka) has done a post about alternative annotation pipelines:

http://thegenomefactory.blogspot.co.nz/2013/03/bacterial-genome-annotation-systems.html

Update: Torsten Seeman now recommends Bakta as a "worthy successor to Prokka":

https://github.com/oschwengers/bakta

Bakta is a tool for the rapid & standardized annotation of bacterial genomes and plasmids from both isolates and MAGs. It provides dbxref-rich, sORF-including and taxon-independent annotations in machine-readable JSON & bioinformatics standard file formats for automated downstream analysis.

gringer
  • 14,012
  • 5
  • 23
  • 79
  • 2
    bacterial person here: I can confirm that prokka is the de-facto standard annotation pipeline for bacterial genomes, and should do a great job in annotating all gene features – mgalardini Sep 18 '17 at 07:41
  • 1
    Fantastic! Thank you so much for the great and detailed post, gringer! That helps so much! And thanks also mgalardini for confirming the info given! – myflow Sep 18 '17 at 07:48