4

For example, see this gene (nad1) in ENA: http://www.ebi.ac.uk/ena/data/view/ABI60879

If you look at the XML for that gene you see the following:

join(
             DQ984518.1: 324706 .. 325091 ,
  complement(DQ984518.1:  24417 ..  24498),
  complement(DQ984518.1:  22828 ..  23019),
             DQ984518.1:   3484 ..   3542 ,
  complement(DQ984518.1: 153702 .. 153960)
)

Which shows 5 exons joined out of phase and out of order. Is there a valid GTF representation of this?

How to dump a 'non-canonically spliced' gene into GTF? i.e. what's the recommendation?

Cross-posted on biostars

Daniel Standage
  • 5,080
  • 15
  • 50
Dan
  • 612
  • 3
  • 12
  • Maybe you could explain better what is the particularity of splicing for this gene. It is not easy to see in the link you give. – bli Jun 21 '17 at 09:36
  • Yes, that's much clearer. I don't have an answer, though – bli Jun 21 '17 at 09:48

1 Answers1

3

To my knowledge there's no defined way to deal with that in GTF. GFF3 handles trans-splicing (you'll have to scroll down to "trans-spliced transcript") by giving an individual transcript multiple parents (e.g., ID=some_transspliced_gene;Parent=gene1,gene2). You could use the same methodology with GTF files, but just note that it'll break most downstream programs.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60