-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEP 025 -- Merge ComponentDefinition and ModuleDefinition #58
Comments
Needs more examples -- @cjmyers ? |
Update as of COMBINE 2018 There has been discussion as to whether to make this a 2.x change or a 3 change. No consensus has yet been reached. |
How could this possibly be a 2.x change, if one of the two is going away? |
Chris suggested adding interactions to ComponentDefinition as an interim 2.x step. I do not think this is a good idea however.
James
…Sent from my iPhone
On 11 Oct 2018, at 17:43, Jacob Beal ***@***.***> wrote:
How could this possibly be a 2.x change, if one of the two is going away?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I agree that it is not a good idea for the specification, but it may be good to have this in the library sooner than later to enable flattening for visualization tools.
Chris
… On Oct 11, 2018, at 7:39 PM, James Alastair McLaughlin ***@***.***> wrote:
Chris suggested adding interactions to ComponentDefinition as an interim 2.x step. I do not think this is a good idea however.
James
Sent from my iPhone
> On 11 Oct 2018, at 17:43, Jacob Beal ***@***.***> wrote:
>
> How could this possibly be a 2.x change, if one of the two is going away?
>
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#58 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADWD9_2J_tjxQBjKC9eM_-ibYSm86jpLks5uj9bOgaJpZM4U6__H>.
|
This change, along with SEP 015, is partially implemented in my fork at http://github.com/udp/SBOL-specification |
Playing devil's advocate, how would an author of a visualization tool approach the problem of rendering the module in Fig. 1, assuming that it is represented as |
@bbartley I think this can be addressed by a key question not yet well-enough addressed in this SEP, which is the handling of sequences. In SEP 035 I proposed that an object can only have a The criteria for determining that have not been fully worked out yet, but linearizability (however computed) does still provide a clear distinction:
Thus, it's not directly about types, roles, or even interactions (which can exist within a purely linearizable object, e.g., a CDS indirectly stimulating a promoter). It's all about |
Sorry for missing the call on Thursday. As said, I am in favor of the merge but I think these and other concerns would be best addressed by actual Component(Definition) subclasses. It should not be the job of a programmer to implement complex inference rules and query multiple fields of a component record just to figure out whether a given componentDef is a "structural" (aka molecule) description or something alltogether different. A sub-class each for DNA / Protein / RNA / small molecule would solve this without any fuzz. |
@udp, as both the author of this SEP and an author of SBOL visualization tools, what are your thoughts regarding the issue I raised above (i.e., how will the data model encode the module in Fig. 1 such that a tool knows how to render it properly), and do you have a solution you would like to propose? |
Having multiple component types does not solve the issue that Bryan is raising. The issue with Figure 1, is the components on the DNA strand are not contained within a single Component, so there would be no CD that says that this is a single strand of DNA. This is actually necessary to make it easy to connect the interactions between the CDS and the Protein in this genetic design. Even with the new ComponentReference idea, this is still necessary to support designs like this, since one can always flatten down into this "flat" ComponentDefinition. Let's consider this example in more detail to see how this would be rendered. ComponentDefinition MyDesign The SequenceConstraints and Interactions should be sufficient to make it clear how to render this design. Namely, if components are in a precedes SequenceConstraint together, then they must be on the same sequence. Here is an alternative without SequenceConstraints: ComponentDefinition MyDesign In this example, I'm assuming that we have the new Location field on Component that Raik proposed and has been approved for SBOL3. We are also using the new Sequence reference on Location that was approved for the SBOL 2.3. In this case, we know how to render the DNA part from the fact that they all have Locations with references to the same sequence where they are found, and we order based on position. To see why more object types alone do not help, consider a CD that includes two separate DNA designs. You need one of the two schemes above to determine if a Component is on DNA strand 1 or DNA strand 2. |
I have come to the conclusion that there is a significant difference between a component that has a linear primary structure ("sequence") and a component that does not. This will usually be determined by type (sequence = DNA, RNA, protein; non-sequence = everything else, including multiple DNA, RNA, protein). A sequential component can have all the same properties as a non-sequential component, but also has a defined ordering of its elements. I would thus propose that the Component base class have subComponents, interactions, constraints, and models, and that its subclass, SequentialComponent, be where sequences and sequenceFeatures are found (and also between which sequentialConstraints are valid). In effect, this is an inverse of @bbartley's proposal, recognizing that composition is generic but linear composition is a more special case. Embracing this would also have implications for flattening and rendering. Flattening then ends up with a 1-layer structure for a SequentialComponent and a 2-layer structure for a general Component (the lower layer being all of its SequentialComponent elements). Rendering would then proceed directly from the flattened structure. @udp: would you be amenable to this as a friendly amendment on this SEP? |
This proposal essentially is a reversion to the status quo. This would mean that sequential components are the present CD and non-sequential ones are the current MD. There is no advantage that I can see to this SEP at that point. Remember, the goal of the CD/MD merge was to allow sequential elements like sequenceConstraints and behavioral elements like Interactions to co-exist at the same level of hierarchy. This is needed to simplify visualization software. Namely, we need a way to flatten to 1-layer in all cases. For these reasons, I'm not in favor of this amendment. |
@cjmyers It is not a reversion, as it puts ModuleDefinition on top and lets a SequentialComponent inherit all of its elements. This means that you do get to have sequenceConstraints and interactions at the same level of the hierarchy. If you don't do something like this, though, how do you want to figure out which components are associated with which sequences? |
Through the Sequence reference that we added to Location objects. Please see my example above. Being able to flatten out ALL hierarchy is the actual reason that we wanted to merge CD and MD in the first place. We want to be able to come up with a flattened component with no MapsTos (or ComponentReferences). The ability to flatten is critical for visualization tools. |
I can only iterate again and again that lumping everything into a single
Component class is a bad idea. It might help some visualization task that
Chris cares about right now but it definitely causes headaches for a lot of
other scenarios, including the very simple identification of what parts of
an SBOL record a software client can deal with or not.
Jake's proposal would address a lot of those issues. Basically, the parent
class Component is the generic case and the sub-class SequenceComponent
(could also be called Molecule), would add sequence-specific features. A
Sequence(tial)Component would always represent a single consecutive
molecule (gene, plasmid, genome, etc). I would go one step further and also
create specific sub-classes for Protein, DNA and RNA but that seems not to
get a majority. The SequentialComponent aka Molecule would address most of
the issues though. It's a good compromise, I think.
/Raik
…On Mon, Nov 25, 2019 at 5:11 AM cjmyers ***@***.***> wrote:
Through the Sequence reference that we added to Location objects. Please
see my example above. Being able to flatten out ALL hierarchy is the actual
reason that we wanted to merge CD and MD in the first place. We want to be
able to come up with a flattened component with no MapsTos (or
ComponentReferences). The ability to flatten is critical for visualization
tools.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#58?email_source=notifications&email_token=AAOGZXPGRAPPYSWITDGARKLQVMX5NA5CNFSM4FHL77D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFA4UUQ#issuecomment-557959762>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOGZXO6JIUQ32OGPB5TQ2TQVMX5NANCNFSM4FHL77DQ>
.
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
|
To clarify, this is not "some visualization task that Chris cares about", this is all tools that do visualization have cared about this. For example, James tool essentially is already merging into a single class as the only way to make his visualization tool work. Other tools which are not supporting this are stuck and unable to render these things due to the complexity of having these two types of hierarchical objects. This split hierarchy has been one of the most difficult things to explain to people new to SBOL and developing tools, and it has been hampering development of SBOL compliant tools. I hear your concerns, but we have now had several years of experience with having split hierarchy (i.e., ComponentDefinitions for structural objects and ModuleDefinitions for behavioral objects), and this has been a tremendous bottleneck to learning SBOL for developers. The complete merger of CD and MD is not something to solve my problems. Indeed, I know enough SBOL to work around this issue. This is for other developers who are less able to get their head around SBOL. Bryan and Jake's suggestion to merge CD/MD via class hierarchy while on the surface sounds like a good compromise, it does not solve the issues that I've seen with real developers using SBOL, since we would still have two types of hierarchical objects and we would be continuously casting between them. Let me give you an example of why this is so confusing. I'm working with a team developing a tool called SBOLCanvas (try it out here: https://sbolcanvas.org/canvas/ beware the initial load is a little slow). When they are adding DNA parts onto the canvas, they are building a structural object (i.e., CD today), then they decide to add a 2nd DNA strand to their diagram and then connect them through production and inhibition arrows. Once they do this, it is no longer a structural object, but instead it becomes a behavioral object (i.e., MD today). They would then need to dynamically change the type of the object they are building from CD to MD. This gets even more confusing when you create a hierarchical structural object. For example, you add a terminator onto a strand, then you dive inside to add two terminators (basically, you have a terminator that is actually a double terminator). This inner canvas is now a CD, while the outer one is an MD. In other words, the type of object that the canvas is becomes dependent on what you put into the canvas. It took a lot of time to explain this to the developers of this software. I think one of the four eventually understood the issue, but I'm not sure about the others. Keep in mind, I'm only telling you part of the story. Things get even more complex when we add in the challenges with making the interaction connections between the two strands at the top level, which involves a lot of MapsTos now and ComponentReferences in SBOL3. None of the developers understand this now, and the tool is not currently doing this correctly. If we do a pure CD/MD merge, then all this complexity goes away. Our specification gets a lot shorter (ask James, he has actually taken a stab at cutting it down), and our barrier to entry becomes lower for new developers. Indeed, the merger would even allow neither MapsTos OR ComponentReferences to be needed for the types of diagrams I just explained. So, my objections to sub-classing is not something that I take lightly for a specific use case. Instead, this is borne from experience with trying to train people to use SBOL. While the arguments for sub-classing might make some things theoretically cleaner, I don't see the added complexity for developers as being worth it. |
I think that I need to clarify again. My proposal does not suffer from these same issues, because it segregates only the sequential aspects to a subclass. Thus, you do not need to maintain two sets of hierarchical objects --- only one set, for which a subset have additional properties and constraints. |
Hi Chris,
if we follow Jake's suggestion, there is no need for type-casting in any of
your examples. If the design starts (kind of) bottom up with the structural
(molecular) components, these will be created as SequentialComponents from
the start. Being a child-class of Component, you can later add
sub-components (presumably these should be other SequentialComponents) but
you can also directly add Interactions or even Models or anything else you
would add to Component itself. They are full-fledged Components with the
additional capability to encode a molecule's sequence (directly or through
subComponents).
/Raik
…On Wed, Nov 27, 2019 at 10:10 PM cjmyers ***@***.***> wrote:
To clarify, this is not "some visualization task that Chris cares about",
this is all tools that do visualization have cared about this. For example,
James tool essentially is already merging into a single class as the only
way to make his visualization tool work. Other tools which are not
supporting this are stuck and unable to render these things due to the
complexity of having these two types of hierarchical objects. This split
hierarchy has been one of the most difficult things to explain to people
new to SBOL and developing tools, and it has been hampering development of
SBOL compliant tools. I hear your concerns, but we have now had several
years of experience with having split hierarchy (i.e., ComponentDefinitions
for structural objects and ModuleDefinitions for behavioral objects), and
this has been a tremendous bottleneck to learning SBOL for developers. The
complete merger of CD and MD is not something to solve my problems. Indeed,
I know enough SBOL to work around this issue. This is for other developers
who are less able to get their head around SBOL.
Bryan and Jake's suggestion to merge CD/MD via class hierarchy while on
the surface sounds like a good compromise, it does not solve the issues
that I've seen with real developers using SBOL, since we would still have
two types of hierarchical objects and we would be continuously casting
between them.
Let me give you an example of why this is so confusing. I'm working with a
team developing a tool called SBOLCanvas (try it out here:
https://sbolcanvas.org/canvas/ beware the initial load is a little slow).
When they are adding DNA parts onto the canvas, they are building a
structural object (i.e., CD today), then they decide to add a 2nd DNA
strand to their diagram and then connect them through production and
inhibition arrows. Once they do this, it is no longer a structural object,
but instead it becomes a behavioral object (i.e., MD today). They would
then need to dynamically change the type of the object they are building
from CD to MD. This gets even more confusing when you create a hierarchical
structural object. For example, you add a terminator onto a strand, then
you dive inside to add two terminators (basically, you have a terminator
that is actually a double terminator). This inner canvas is now a CD, while
the outer one is an MD. In other words, the type of object that the canvas
is becomes dependent on what you put into the canvas. It took a lot of time
to explain this to the developers of this software. I think one of the four
eventually understood the issue, but I'm not sure about the others. Keep in
mind, I'm only telling you part of the story. Things get even more complex
when we add in the challenges with making the interaction connections
between the two strands at the top level, which involves a lot of MapsTos
now and ComponentReferences in SBOL3. None of the developers understand
this now, and the tool is not currently doing this correctly.
If we do a pure CD/MD merge, then all this complexity goes away. Our
specification gets a lot shorter (ask James, he has actually taken a stab
at cutting it down), and our barrier to entry becomes lower for new
developers. Indeed, the merger would even allow neither MapsTos OR
ComponentReferences to be needed for the types of diagrams I just explained.
So, my objections to sub-classing is not something that I take lightly for
a specific use case. Instead, this is borne from experience with trying to
train people to use SBOL. While the arguments for sub-classing might make
some things theoretically cleaner, I don't see the added complexity for
developers as being worth it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#58?email_source=notifications&email_token=AAOGZXP3XLOZKXC5AYAG3V3QV3A2ZA5CNFSM4FHL77D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFKPPJY#issuecomment-559216551>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOGZXMCL4ZJSLVYDXRI5PDQV3A2ZANCNFSM4FHL77DQ>
.
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
|
And if I understand Jake correctly, the scope of SequentialComponent ends
at the level of a single (DNA/RNA/Protein) molecule specified by a
sequence. I think that makes for a very clear boundary. That's why my
suggestion to name it Molecule, just to make that crystal clear. A single
genome is a molecule. A plasmid is a molecule. A given protein is a
molecule. Two (co-transfected) plasmids will not be a single molecule any
longer so they have to be a Component with two subComponents pointing to
each of the two Molecules. A protein complex will also be a component with
subComponents for each involved molecule. But, being Components themselves,
Molecules can be directly pulled into Interactions, Models, or functional
devices of any sort. This looks very good to me. It should be reasonably
easy to explain to non-chemists / biologists, too. I think it would
actually facilitate the job of visualization tools.
…On Wed, Nov 27, 2019 at 11:04 PM Jacob Beal ***@***.***> wrote:
I think that I need to clarify again. My proposal does *not* suffer from
these same issues, because it segregates only the sequential aspects to a
subclass. Thus, you do not need to maintain two sets of hierarchical
objects --- only one set, for which a subset have additional properties and
constraints.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#58?email_source=notifications&email_token=AAOGZXLSAQAUAZTHTAQS253QV3HFVA5CNFSM4FHL77D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFKTU2Q#issuecomment-559233642>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOGZXP5TDO75T5VNJRCYZDQV3HFVANCNFSM4FHL77DQ>
.
--
___________________________________
Raik Grünberg
http://www.raiks.de/contact.html
___________________________________
|
There's something about this workflow that doesn't make sense to me, and I don't understand why typecasting is necessary. But perhaps we can return to that discussion a little later. At this point, I'm not sure my previous question, about how a visualization tool knows when to draw a module-type To me, there is a very important semantic distinction between |
@jakebeal I'm still not understanding this argument. Let me try again with my example of a design in SBOLCanvas. Let's assume that I've added to the canvas two promoters, two RBS, two CDSs, and two terminators. I then want to add SequenceConstraints and SequenceAnnotations that indicate that these are actually two separate transcriptional units, each with a separate linear sequence. Finally, I want to add that there is an Interaction from the CDS of one to the promoter of another to indicate that the product produced from one represses the other. In other words, this is what I want to express in SBOL3: Component GeneticCircuit Now, here is my problem. what type of object is GeneticCircuit? It has Sequences, it has SequenceConstraints, it has SequenceAnnotations. Therefore, it must be a SequentialComponent (or Molecule as @graik calls it), but this does not sound right. However, if you say that only SequentialComponents can have Sequences, then I'm stuck making this a SequentialComponent. What am I missing here? |
@cjmyers: I would expect that SBOLCanvas is generally operating on a "master" Component that contains some number SubComponents, some of them SequentialComponents and some of them not. The "master" is not a SequentialComponent, because it can contain non-connected SequentialComponents and things like small molecules and media that are not SequentialComponents. When you drop a DNA/RNA part on the canvas, it becomes a separate "solo" [Sequential]Component. When you link a "solo" SequentialComponent up with another "solo" SequentialComponent (e.g., by a SequenceConstraint), then they get combined into a "group" SequentialComponent and their SequenceAnnotation fields computed (or, if we finish accepting SEP 013, there are no SequencAannotations, just the location properties on the SubComponents). Hooking a "solo" and "group" together changes the "solo" from a SubComponent of the "master" Component to instead be a SubComponent of the "group" SequentialComponent (and the inverse when disconnected). Likewise, groups can be merged or split. In your example, then, we end up with one "master" Component that contains two SubComponents, each a SequentialComponent. Those two in turn each contain four SubComponent SequentialComponents. The interaction lives in the master Component, since it bridges between two of its SubComponents, accessing the relevant Sub-SubComponents via a ComponentReference per SEP 037. |
@jakebeal This is exactly what I'm trying to avoid. ComponentReferences should not be required when you flatten a design. This was the motivation that James and I had for merging MD/CD in the first place. The goal was to make it possible to create flattened designs which inlined all objects. This was why James proposed this SEP, and I supported it. If we make the change you propose, this proposal no longer meets the goal to reduce to one level of hierarchy and the net effect is that we rename ModuleDefinition to Component and ComponentDefinition to SequentialComponent. I think you are missing my point because ComponentReferences are not as cumbersome as MapsTos but they are still references, which means it should be possible to inline these. If you read the SEP again, you will see what the goals were:
If you want to write a new SEP to propose this change, you can, but I cannot see how this is a simple amendment to this SEP, since it does not meet the goals set out for this change. |
Full details in: https://github.com/SynBioDex/SEPs/blob/master/sep_025.md
The text was updated successfully, but these errors were encountered: