5

Can FASTA files have nucleotide and protein sequences within them; or must they only have 1 type? For example, a FASTA file has 2 sequences. Can the first one encode amino acids while the second one encodes bases?

Thank you

user1510
  • 59
  • 1

1 Answers1

9

While there's nothing stopping anyone from doing that with the FASTA format (after all, it's just a text file with '>' defining header lines), I don't know of any software that would support such a file structure. At best, it would interpret the nucleotide sequences as protein sequences (A/C/G/T are all valid 1-letter protein codes).

A better question to ask would be "Does ultra-cool-bioinformatics-tool X support combined nucleotide and protein sequences in the same FASTA file?" In which case the answer would most likely be, "No."

gringer
  • 14,012
  • 5
  • 23
  • 79
  • 3
    Yep. The FASTA format is very forgiving. Tools that consume data in FASTA format are...less forgiving. – Daniel Standage Sep 21 '17 at 05:19
  • If, on the other hand, Your software is dumb and will try to read aa sequence as nucletides, "fun" things will ensue. Might I suggest using nexus format in this case? – Maciej Sep 21 '17 at 08:09
  • 1
    The only obvious use case I can think of is simply using the fasta file to store all of your sequences and then extracting whichever you want to process from it. Any tool that can extract specific sequences from a multi fasta file shouldn't care whether those sequences are protein or nucleotide. – terdon Sep 21 '17 at 08:57
  • @terdon technically yes, but it's very bad practice. You literally wouldn't be able to hand this file over to someone else without additional metadata. There is no standard to specify whether the sequence is DNA or protein, and therefore no software package should be expected to be able to tell the two apart. – Unknown artist Sep 25 '17 at 04:15