In SBOL, when should I annotate a DNA sequence vs. making sub-components?

Question

When representing a DNA sequence in SBOL as a ComponentDefinition, you can mark things like promoters and coding sequences in two different ways:

either as a SequenceAnnotation in the ComponentDefinition, or
as a Component linking to a ComponentDefinition for the sub-comoponent

When should you choose one over the other?

@James McLaughlin. If there is some error on the question and answers about the class I think it would be better if you provide a comment it could be a misunderstanding (unless it has changed and you are updating the question which it would be nice to know in the edit summary) — llrs, Jul 26 '19 at 13:55
@jakebeal, there is a pending edit for both your question and your answer (you might not have access to those links, if you don't just click on the "edit" under each or the notification in your inbox) changing SequenceFeature to SequenceAnnotation. Please confirm which of those is the correct class name. — terdon, Jul 29 '19 at 09:09
@terdon Ah, James was right --- I typo'ed the names. Edit accepted. — jakebeal, Jul 29 '19 at 10:30

score 4 · Answer 1 · edited Jul 29 '19 at 10:29

Typically you want to use a ComponentDefinition when an sub-structure is something you might want to pull out and re-use in another genetic design, and a SequenceAnnotation whenever it is not.

For example:

Promoters, terminators, and protein coding sequences are typically best represented with their own ComponentDefinition, since these are often pulled out and re-combined in new genetic designs. Representing such an element with a ComponentDefinition makes it easy to find the commonalities between designs.
Assembly scars, binding sites, and cut sites are typically best represented with a SequenceAnnotation because they are typically not very interesting except in the context of their surrounding genetic context. Representing such an element with a SequenceAnnotation helps keep them associated tightly with that context.

Not every case is cut and dried, however. For example, ribosome entry sites are often treated as separable components, but their performance is very tightly tied to the specifics of the coding sequence that they modulate. Thus, in some cases it may make more sense to represent them separately as a ComponentDefinition and in others to represent them as a SequenceAnnotation on a joint RES/CDS ComponentDefinition.

Note that this answer was written for SBOL 2. For SBOL 3, change ComponentDefinition to Component and SequenceAnnotation to SequenceFeature. There are also shortcuts in SBOL 3 for representing things in other databases like UniProt and ChEBI — jakebeal, Mar 16 '21 at 21:55

In SBOL, when should I annotate a DNA sequence vs. making sub-components?

1 Answers1