Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEP 025 -- Merge ComponentDefinition and ModuleDefinition #58

Closed
jamesamcl opened this issue Jun 28, 2018 · 24 comments
Closed

SEP 025 -- Merge ComponentDefinition and ModuleDefinition #58

jamesamcl opened this issue Jun 28, 2018 · 24 comments

Comments

@jamesamcl
Copy link
Member

jamesamcl commented Jun 28, 2018

Full details in: https://github.com/SynBioDex/SEPs/blob/master/sep_025.md

@jamesamcl
Copy link
Member Author

Needs more examples -- @cjmyers ?

@cjmyers cjmyers changed the title SEP 026 -- Merge ComponentDefinition and ModuleDefinition SEP 024 -- Merge ComponentDefinition and ModuleDefinition Jun 28, 2018
@cjmyers cjmyers changed the title SEP 024 -- Merge ComponentDefinition and ModuleDefinition SEP 025 -- Merge ComponentDefinition and ModuleDefinition Jun 28, 2018
@NeilWipat
Copy link
Collaborator

Update as of COMBINE 2018

There has been discussion as to whether to make this a 2.x change or a 3 change. No consensus has yet been reached.

@jakebeal
Copy link
Contributor

How could this possibly be a 2.x change, if one of the two is going away?

@jamesamcl
Copy link
Member Author

jamesamcl commented Oct 11, 2018 via email

@cjmyers
Copy link
Contributor

cjmyers commented Oct 12, 2018 via email

@jamesamcl
Copy link
Member Author

This change, along with SEP 015, is partially implemented in my fork at http://github.com/udp/SBOL-specification

@bbartley
Copy link
Contributor

bbartley commented Nov 22, 2019

Playing devil's advocate, how would an author of a visualization tool approach the problem of rendering the module in Fig. 1, assuming that it is represented as ComponentDefinition? Currently, a visual tool knows which glyphs to render by interpreting ontology terms in the types and roles properties. However, in this case there is no explicit term telling the visual tool how to render this very special type of ComponentDefinition (i.e., it is not a simple glyph, but a box containing and bounding subcomponents). How does an author of a visualization tool distinguish between a module-type ComponentDefinition and an Unspecified glyph?

@jakebeal
Copy link
Contributor

@bbartley I think this can be addressed by a key question not yet well-enough addressed in this SEP, which is the handling of sequences. In SEP 035 I proposed that an object can only have a sequence or sequenceAnnotation if all of its elements can be linearized into a single sequence.

The criteria for determining that have not been fully worked out yet, but linearizability (however computed) does still provide a clear distinction:

  • Any linearizable ComponentDefinition should be rendered using backbone-rendering rules.
  • Any non-linearizable ComponentDefinition should be rendered using module rules, in which each linearizable subset can be rendered using backbone-rendering.

Thus, it's not directly about types, roles, or even interactions (which can exist within a purely linearizable object, e.g., a CDS indirectly stimulating a promoter). It's all about

@graik
Copy link
Contributor

graik commented Nov 23, 2019

Sorry for missing the call on Thursday. As said, I am in favor of the merge but I think these and other concerns would be best addressed by actual Component(Definition) subclasses. It should not be the job of a programmer to implement complex inference rules and query multiple fields of a component record just to figure out whether a given componentDef is a "structural" (aka molecule) description or something alltogether different. A sub-class each for DNA / Protein / RNA / small molecule would solve this without any fuzz.

@bbartley
Copy link
Contributor

@udp, as both the author of this SEP and an author of SBOL visualization tools, what are your thoughts regarding the issue I raised above (i.e., how will the data model encode the module in Fig. 1 such that a tool knows how to render it properly), and do you have a solution you would like to propose?

@cjmyers
Copy link
Contributor

cjmyers commented Nov 24, 2019

Having multiple component types does not solve the issue that Bryan is raising. The issue with Figure 1, is the components on the DNA strand are not contained within a single Component, so there would be no CD that says that this is a single strand of DNA. This is actually necessary to make it easy to connect the interactions between the CDS and the Protein in this genetic design. Even with the new ComponentReference idea, this is still necessary to support designs like this, since one can always flatten down into this "flat" ComponentDefinition.

Let's consider this example in more detail to see how this would be rendered.

ComponentDefinition MyDesign
Component Prom
Component RBS
Component CDS
Component Term
Component Prot
SequenceConstraint Prom precedes RBS
SequenceConstraint RBS precedes CDS
SequenceConstraint CDS precedes Term
Interaction CDS codes for Prot

The SequenceConstraints and Interactions should be sufficient to make it clear how to render this design. Namely, if components are in a precedes SequenceConstraint together, then they must be on the same sequence.

Here is an alternative without SequenceConstraints:

ComponentDefinition MyDesign
Component Prom Location 1..40 SeqRef dnaSeq
Component RBS Location 41..46 SeqRef dnaSeq
Component CDS Location 47..150 SeqRef dnaSeq
Component Term Location 150..160 SeqRef dnaSeq
Component Prot Location Generic SeqRef protSeq
Sequence dnaSeq
Sequence protSeq
Interaction CDS codes for Prot

In this example, I'm assuming that we have the new Location field on Component that Raik proposed and has been approved for SBOL3. We are also using the new Sequence reference on Location that was approved for the SBOL 2.3. In this case, we know how to render the DNA part from the fact that they all have Locations with references to the same sequence where they are found, and we order based on position.

To see why more object types alone do not help, consider a CD that includes two separate DNA designs. You need one of the two schemes above to determine if a Component is on DNA strand 1 or DNA strand 2.

@jakebeal
Copy link
Contributor

jakebeal commented Nov 25, 2019

I have come to the conclusion that there is a significant difference between a component that has a linear primary structure ("sequence") and a component that does not. This will usually be determined by type (sequence = DNA, RNA, protein; non-sequence = everything else, including multiple DNA, RNA, protein). A sequential component can have all the same properties as a non-sequential component, but also has a defined ordering of its elements.

I would thus propose that the Component base class have subComponents, interactions, constraints, and models, and that its subclass, SequentialComponent, be where sequences and sequenceFeatures are found (and also between which sequentialConstraints are valid). In effect, this is an inverse of @bbartley's proposal, recognizing that composition is generic but linear composition is a more special case.

Embracing this would also have implications for flattening and rendering. Flattening then ends up with a 1-layer structure for a SequentialComponent and a 2-layer structure for a general Component (the lower layer being all of its SequentialComponent elements). Rendering would then proceed directly from the flattened structure.

@udp: would you be amenable to this as a friendly amendment on this SEP?

@cjmyers
Copy link
Contributor

cjmyers commented Nov 25, 2019

This proposal essentially is a reversion to the status quo. This would mean that sequential components are the present CD and non-sequential ones are the current MD. There is no advantage that I can see to this SEP at that point. Remember, the goal of the CD/MD merge was to allow sequential elements like sequenceConstraints and behavioral elements like Interactions to co-exist at the same level of hierarchy. This is needed to simplify visualization software. Namely, we need a way to flatten to 1-layer in all cases. For these reasons, I'm not in favor of this amendment.

@jakebeal
Copy link
Contributor

@cjmyers It is not a reversion, as it puts ModuleDefinition on top and lets a SequentialComponent inherit all of its elements. This means that you do get to have sequenceConstraints and interactions at the same level of the hierarchy.

If you don't do something like this, though, how do you want to figure out which components are associated with which sequences?

@cjmyers
Copy link
Contributor

cjmyers commented Nov 25, 2019

Through the Sequence reference that we added to Location objects. Please see my example above. Being able to flatten out ALL hierarchy is the actual reason that we wanted to merge CD and MD in the first place. We want to be able to come up with a flattened component with no MapsTos (or ComponentReferences). The ability to flatten is critical for visualization tools.

@graik
Copy link
Contributor

graik commented Nov 27, 2019 via email

@cjmyers
Copy link
Contributor

cjmyers commented Nov 27, 2019

To clarify, this is not "some visualization task that Chris cares about", this is all tools that do visualization have cared about this. For example, James tool essentially is already merging into a single class as the only way to make his visualization tool work. Other tools which are not supporting this are stuck and unable to render these things due to the complexity of having these two types of hierarchical objects. This split hierarchy has been one of the most difficult things to explain to people new to SBOL and developing tools, and it has been hampering development of SBOL compliant tools. I hear your concerns, but we have now had several years of experience with having split hierarchy (i.e., ComponentDefinitions for structural objects and ModuleDefinitions for behavioral objects), and this has been a tremendous bottleneck to learning SBOL for developers. The complete merger of CD and MD is not something to solve my problems. Indeed, I know enough SBOL to work around this issue. This is for other developers who are less able to get their head around SBOL.

Bryan and Jake's suggestion to merge CD/MD via class hierarchy while on the surface sounds like a good compromise, it does not solve the issues that I've seen with real developers using SBOL, since we would still have two types of hierarchical objects and we would be continuously casting between them.

Let me give you an example of why this is so confusing. I'm working with a team developing a tool called SBOLCanvas (try it out here: https://sbolcanvas.org/canvas/ beware the initial load is a little slow). When they are adding DNA parts onto the canvas, they are building a structural object (i.e., CD today), then they decide to add a 2nd DNA strand to their diagram and then connect them through production and inhibition arrows. Once they do this, it is no longer a structural object, but instead it becomes a behavioral object (i.e., MD today). They would then need to dynamically change the type of the object they are building from CD to MD. This gets even more confusing when you create a hierarchical structural object. For example, you add a terminator onto a strand, then you dive inside to add two terminators (basically, you have a terminator that is actually a double terminator). This inner canvas is now a CD, while the outer one is an MD. In other words, the type of object that the canvas is becomes dependent on what you put into the canvas. It took a lot of time to explain this to the developers of this software. I think one of the four eventually understood the issue, but I'm not sure about the others. Keep in mind, I'm only telling you part of the story. Things get even more complex when we add in the challenges with making the interaction connections between the two strands at the top level, which involves a lot of MapsTos now and ComponentReferences in SBOL3. None of the developers understand this now, and the tool is not currently doing this correctly.

If we do a pure CD/MD merge, then all this complexity goes away. Our specification gets a lot shorter (ask James, he has actually taken a stab at cutting it down), and our barrier to entry becomes lower for new developers. Indeed, the merger would even allow neither MapsTos OR ComponentReferences to be needed for the types of diagrams I just explained.

So, my objections to sub-classing is not something that I take lightly for a specific use case. Instead, this is borne from experience with trying to train people to use SBOL. While the arguments for sub-classing might make some things theoretically cleaner, I don't see the added complexity for developers as being worth it.

@jakebeal
Copy link
Contributor

I think that I need to clarify again. My proposal does not suffer from these same issues, because it segregates only the sequential aspects to a subclass. Thus, you do not need to maintain two sets of hierarchical objects --- only one set, for which a subset have additional properties and constraints.

@graik
Copy link
Contributor

graik commented Nov 27, 2019 via email

@graik
Copy link
Contributor

graik commented Nov 27, 2019 via email

@bbartley
Copy link
Contributor

bbartley commented Nov 27, 2019

When they are adding DNA parts onto the canvas, they are building a structural object (i.e., CD today), then they decide to add a 2nd DNA strand to their diagram and then connect them through production and inhibition arrows. Once they do this, it is no longer a structural object, but instead it becomes a behavioral object (i.e., MD today). They would then need to dynamically change the type of the object they are building from CD to MD.

There's something about this workflow that doesn't make sense to me, and I don't understand why typecasting is necessary. But perhaps we can return to that discussion a little later.

At this point, I'm not sure my previous question, about how a visualization tool knows when to draw a module-type Component has been satisfactorily answered. Maybe it has been, and I just got lost in the discussion.

To me, there is a very important semantic distinction between Component-type objects versus Module-type objects. Components simply describe the structural and hierarchical composition of a system at a given point in time, whereas Modules describe processes or dynamics of a system (e.g. inhibition) with an implicit change in composition of the system (including inputs and outputs). Cramming all those semantics into a single Component class feels intuitively very wrong, and I trust my intuition, because there are often good reasons supporting that intuition that are not easily argued in a Github issue tracker.

@cjmyers
Copy link
Contributor

cjmyers commented Nov 27, 2019

I think that I need to clarify again. My proposal does not suffer from these same issues, because it segregates only the sequential aspects to a subclass. Thus, you do not need to maintain two sets of hierarchical objects --- only one set, for which a subset have additional properties and constraints.

@jakebeal I'm still not understanding this argument. Let me try again with my example of a design in SBOLCanvas. Let's assume that I've added to the canvas two promoters, two RBS, two CDSs, and two terminators. I then want to add SequenceConstraints and SequenceAnnotations that indicate that these are actually two separate transcriptional units, each with a separate linear sequence. Finally, I want to add that there is an Interaction from the CDS of one to the promoter of another to indicate that the product produced from one represses the other. In other words, this is what I want to express in SBOL3:

Component GeneticCircuit
SubComponent prom1
SubComponent prom2
SubComponent rbs1
SubComponent rbs2
SubComponent cds1
SubComponent cds2
SubComponent term1
SubComponent term2
SequenceConstraint prom1 precedes rbs1
SequenceConstraint rbs1 precedes cds1
SequenceConstraint cds1 precedes term1
SequenceConstraint prom2 precedes rbs2
SequenceConstraint rbs2 precedes cds2
SequenceConstraint cds2 precedes term2
Interaction cds1 inhibits prom2
SequenceAnnotation prom1 1..50 seqRef sequence1
...
SequenceAnnotation prom2 1..50 seqRef sequence2
...
Sequence sequence1
Sequence sequence2

Now, here is my problem. what type of object is GeneticCircuit? It has Sequences, it has SequenceConstraints, it has SequenceAnnotations. Therefore, it must be a SequentialComponent (or Molecule as @graik calls it), but this does not sound right. However, if you say that only SequentialComponents can have Sequences, then I'm stuck making this a SequentialComponent.

What am I missing here?

@jakebeal
Copy link
Contributor

@cjmyers: I would expect that SBOLCanvas is generally operating on a "master" Component that contains some number SubComponents, some of them SequentialComponents and some of them not. The "master" is not a SequentialComponent, because it can contain non-connected SequentialComponents and things like small molecules and media that are not SequentialComponents.

When you drop a DNA/RNA part on the canvas, it becomes a separate "solo" [Sequential]Component. When you link a "solo" SequentialComponent up with another "solo" SequentialComponent (e.g., by a SequenceConstraint), then they get combined into a "group" SequentialComponent and their SequenceAnnotation fields computed (or, if we finish accepting SEP 013, there are no SequencAannotations, just the location properties on the SubComponents). Hooking a "solo" and "group" together changes the "solo" from a SubComponent of the "master" Component to instead be a SubComponent of the "group" SequentialComponent (and the inverse when disconnected). Likewise, groups can be merged or split.

In your example, then, we end up with one "master" Component that contains two SubComponents, each a SequentialComponent. Those two in turn each contain four SubComponent SequentialComponents. The interaction lives in the master Component, since it bridges between two of its SubComponents, accessing the relevant Sub-SubComponents via a ComponentReference per SEP 037.

@cjmyers
Copy link
Contributor

cjmyers commented Nov 28, 2019

@jakebeal This is exactly what I'm trying to avoid. ComponentReferences should not be required when you flatten a design. This was the motivation that James and I had for merging MD/CD in the first place. The goal was to make it possible to create flattened designs which inlined all objects. This was why James proposed this SEP, and I supported it. If we make the change you propose, this proposal no longer meets the goal to reduce to one level of hierarchy and the net effect is that we rename ModuleDefinition to Component and ComponentDefinition to SequentialComponent. I think you are missing my point because ComponentReferences are not as cumbersome as MapsTos but they are still references, which means it should be possible to inline these.

If you read the SEP again, you will see what the goals were:

  1. MapsTos come later - this is no longer true, since we still need these, even if they are now called ComponentReferences and are simpler. They still are needed as soon as you add the first interaction.

  2. Flattening is now possible - as explained above, this would not be true with your proposed change.

  3. The spec becomes shorter - this would not happen, since as explained above, the only real change is the name of the entities and their inheritance hierarchy. We would still need to explain both Components and SequentialComponents and they are essentially the same as MDs and CDs in the specification. I don't think this will have much impact on spec size.

If you want to write a new SEP to propose this change, you can, but I cannot see how this is a simple amendment to this SEP, since it does not meet the goals set out for this change.

@cjmyers cjmyers assigned jamesamcl and unassigned jamesamcl Dec 12, 2019
@jakebeal jakebeal added Accepted and removed Draft labels Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants