7

My annotation file is in .gff format. I would like to convert it to .gtf format or to know if there is a way to directly download the annotation file in .gtf format?

I am working on sequences from the P Falicparum 3D7.

DavyCats
  • 497
  • 2
  • 11
Diango
  • 161
  • 1
  • 1
  • 4

4 Answers4

6

I would suggest to use agat_convert_sp_gff2gtf.pl from AGAT because you loose information with gffread.
e.g here a gff example:

##gff-version 3
scaffold625 maker   gene    337818  343277  .   +   .   ID=CLUHARG00000005458;Name=TUBB3_2
scaffold625 maker   transcript  337818  343277  .   +   .   ID=CLUHART00000008717;Parent=CLUHARG00000005458
scaffold625 maker   CDS 337915  337971  .   +   0   ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625 maker   CDS 340733  340841  .   +   0   ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625 maker   CDS 341518  341628  .   +   2   ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625 maker   CDS 341964  343033  .   +   2   ID=CLUHART00000008717:cds;Parent=CLUHART00000008717
scaffold625 maker   exon    337818  337971  .   +   0   ID=CLUHART00000008717:exon1;Parent=CLUHART00000008717
scaffold625 maker   exon    340733  340841  .   +   0   ID=CLUHART00000008717:exon2;Parent=CLUHART00000008717
scaffold625 maker   exon    341518  341628  .   +   2   ID=CLUHART00000008717:exon3;Parent=CLUHART00000008717
scaffold625 maker   exon    341964  343277  .   +   2   ID=CLUHART00000008717:exon4;Parent=CLUHART00000008717
scaffold625 maker   five_prime_UTR  337818  337914  .   +   .   ID=CLUHART00000008717:five_prime_utr;Parent=CLUHART00000008717
scaffold625 maker   three_prime_UTR 343034  343277  .   +   .   ID=CLUHART00000008717:three_prime_utr;Parent=CLUHART00000008717

with gffread you get:

scaffold625 maker   transcript  337818  343277  .   +   .   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   exon    337818  337971  .   +   .   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   exon    340733  340841  .   +   .   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   exon    341518  341628  .   +   .   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   exon    341964  343277  .   +   .   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   CDS 337915  337971  .   +   0   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   CDS 340733  340841  .   +   0   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   CDS 341518  341628  .   +   2   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";
scaffold625 maker   CDS 341964  343033  .   +   2   transcript_id "CLUHART00000008717"; gene_id "CLUHARG00000005458";

while with AGAT you get

scaffold625 maker   gene    337818  343277  .   +   .   ID CLUHARG00000005458 ; Name TUBB3_2 ; gene_id CLUHARG00000005458
scaffold625 maker   mRNA    337818  343277  .   +   .   ID CLUHART00000008717 ; Parent CLUHARG00000005458 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   exon    337818  337971  .   +   0   ID "CLUHART00000008717:exon1"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   exon    340733  340841  .   +   0   ID "CLUHART00000008717:exon2"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   exon    341518  341628  .   +   2   ID "CLUHART00000008717:exon3"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   exon    341964  343277  .   +   2   ID "CLUHART00000008717:exon4"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   CDS 337915  337971  .   +   0   ID "CLUHART00000008717:cds"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   CDS 340733  340841  .   +   0   ID "CLUHART00000008717:cds"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   CDS 341518  341628  .   +   2   ID "CLUHART00000008717:cds"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   CDS 341964  343033  .   +   2   ID "CLUHART00000008717:cds"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   five_prime_UTR  337818  337914  .   +   .   ID "CLUHART00000008717:five_prime_utr"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717
scaffold625 maker   three_prime_UTR 343034  343277  .   +   .   ID "CLUHART00000008717:three_prime_utr"  ; Parent CLUHART00000008717 ; gene_id CLUHARG00000005458 ; transcript_id CLUHART00000008717

So you can see as example that Name=TUBB3_2 from the gene feature disappeared with gffread.

juke34
  • 311
  • 3
  • 9
5

You can use gffread to convert gff to gtf2, below is from the manual:

In order to see the GTF2 version of the same transcripts, the -T option should be added:

gffread -E annotation.gff -T -o- | more

The examples above also show that gffread can be used to convert a file between GTF2 and GFF3 file formats.

haci
  • 4,092
  • 1
  • 6
  • 28
1

Actually I started a mini-review about the tools to do such conversion.
You can find it here: https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gff_to_gtf.md

As I mentioned earlier they do not behave all the same way...

juke34
  • 311
  • 3
  • 9
0

You can try bioinfokit (https://github.com/reneshbedre/bioinfokit) in python.

from bioinfokit.analys import gff

gff.gff_to_gtf(file="yourfile.gff3")

The converted gtf file will be saved in the same working directory.

```

Ram RS
  • 2,297
  • 10
  • 29
Liyong
  • 1