7

The DNA sequence sections of the three INSDC databases (i.e., DDBJ, ENA Sequence and GenBank) are synchronized periodically and strive to keep their stored data as ubiquitously accessible as possible. Except for idiosyncrasies in their data submission routes, there should be little, if any, reason for preferentially submitting sequence data to one database over another. Yet many researchers display very noticeable preferences!

When asked point blank, many colleagues emphasize that these preferences are not linked to the database-specific data submission or retrieval interfaces available. Yet they cannot provide a technical reason for their choice either.

Maybe researchers base their preference on geographic vicinity ("buy local"), but maybe there are genuine technical reasons for their preference (even if they are unaware of these reasons themselves).

Thus, my question is:

Can you think of any technical reason - however minute or seemingly insignificant - in preferentially submitting DNA sequences to NCBI's GenBank over ENA Sequence (or vice versa)? For example, there may be differences in data storage or accessibility that are relevant to you.

Put differently: Short of flipping a coin, why would you (as a bioinformatics-prone end user) select one database over the other for your data submission?

--

Edit 1: Many SE posters attempt to reassure me that I can submit my DNA sequences to either database and that "the data will be just fine there". Having submitted data to both GenBank and ENA Sequence for 10+ years myself, this is not what this question is about. Instead, this question is about carving out genuine technical differences from a user perspective.

Edit 2: Since no answer has been posted yet despite a bounty, I figure this is a hard question as specified. Thus, I will broaden the question and now also accept non-technical reasons for preferentially submitting DNA sequence data to either GenBank or ENA Sequence. (That being said, I still hope for someone listing a technical reason.)

  • 1
    @David I am deliberately posting this question at SE Biology because I wish to query the biologists' opinion on this. The bioinformatics opinion I can provide myself. Can you think of any difference in data availability or retrieval, however small or insignificant? –  Nov 17 '18 at 22:06
  • 3
    The problem is that questions the answers to which are a matter of opinion are off topic here. But, a comment. I last submitted sequences over 35 years ago and at the time I used EMBL (now presumably ENA) as it was European. However as nowadays I and most others only use NCBI to look for sequences, I really see no point submitting anywhere else. I have recently had to submit RNAseq data to ENA (under a grant obligation) and the user interface for that I find truly dreadful. The Yanks have more money, so all along NCBI has had better user interfaces than EBI. Take advantage of it. – David Nov 17 '18 at 23:08
  • @David Thanks for your response. Actually, your reference to a recent submission to ENA due to "a grant obligation" is quite relevant to me. Having certain restrictions on the data submission imposed by a funding agency can be a very important, albeit non-biological, reason for submitting specifically to ENA. This is the type of info I am looking for. Can you provide a bit more details on this (e.g., is this restriction standard practice for some funding agencies)? –  Nov 18 '18 at 20:41
  • Probably best to take this off list. If you check my bio and personal website you can easily work out my University affiliation and get my email from their website. – David Nov 18 '18 at 21:00
  • By your comments you seem to ask about differences between managing the data repositories. Perhaps you could [edit] the question to change the title and refocus the question – llrs Nov 22 '18 at 10:27
  • 1
    @llrs I refocused the question. – Michael Gruenstaeudl Nov 27 '18 at 10:39
  • @David and Michael, you can also take this to chat here, if you find that easier. Either in our site's main room or by creating a separate room for it. Just go to this site and click "create new room". – terdon Nov 28 '18 at 17:18
  • note that NCBI has both genbank and SRA (sequence read archive). SRA is for next-gen sequencing, and is a different database and submission than genbank. EMBL similarly has ERA. They're also cross referenced. Word on the street is that ERA is more user-friendly for data upload, and has less strict metadata requirements. – rrr Feb 12 '19 at 00:33

1 Answers1

1

Opinion: I prefer to use EBI/ENA because the turnaround time for "customer service" on editing metadata about a submission is faster than on NCBI. Inquiries to change metadata on NCBI can go unanswered for up to six months in my experience.

Answer to your question: No, there are no technical reasons that make it preferable to submit data through one website or the other. All entries end up being hosted by both databases anyway. The preference is up to the user to select whichever upload interface they prefer.

conchoecia
  • 3,141
  • 2
  • 16
  • 40