-
Notifications
You must be signed in to change notification settings - Fork 112
dataset used inconsistently across API #595
Comments
I would agree, with the caveat that we need to be explicit that
+1. This should be uncontroversial.
+1 with a caveat; this is a bit more of a semantic change details on my comment here. If we're doing this, we should consider putting a |
Thanks @calbach CH Albach [email protected] writes:
What is the use case for readgroups being in multiple
A use case for sharing References in multiple ReferenceSets |
Certainly the original intention was that a reference could belong to multiple ReferenceSets. Otherwise there will be lots of duplication. Likewise, I always thought that References and ReferenceSets would stand outside Datasets. Again to avoid duplication because many datasets will use the same reference. Though I confess I can see the case for them to have an "owner" or responsible authority/source. And we perhaps should not rule out someone wanting to use a reference subject to access control, which I think is at the dataset level. Sent from my iPhone
The Wellcome Trust Sanger Institute is operated by Genome Research |
Thank you @richarddurbin Cross-dataset should analysis should be as easy as within the We have a couple of terabytes of assembly hubs with more than One design question I am asking is: should a ReferenceSet be This relates to: Does one construct a new, combined referenceset for I am not arguing for one approach over another, only for having Richard Durbin [email protected] writes:
|
I don't know of any, would have to dig through old GitHub issues/email. I don't support having many:many here.
I wasn't arguing against many:many here, just adding that it should likely have a
Not sure what the correct answer is, but given today's API the answer is clearly to create a new |
For history, #32 has the very long On Mon, Apr 4, 2016 at 8:56 PM, CH Albach [email protected] wrote:
|
thanks @dglazer; that piece of history is enlighten. Some of the discussion seems to confound the concepts of grouping for data production vs data analysis. A fairly ridged model is good for data production, which is my understanding of what readgroupset (all readgroups for an sequence experiment on a sample) and readgroup are. For data analysis, one wants to be able to group things is fairly arbitrary ways. This is better done with different types of linking (maybe as simple as lists of readgroupsets). |
Thanks @dglazer; that piece of history is enlighten. Some of the discussion seems to confound the concepts of grouping for data production vs data analysis. A fairly ridged model is good for data production, which is my understanding of what readgroupset (all readgroups for an sequence experiment on a sample) and readgroup are. For data analysis, one wants to be able to group things is fairly arbitrary ways. This is better done with different types of linking (maybe as simple as lists of readgroupsets). David Glazer [email protected] writes:
|
dataset is not used consistently by the datamodel. Since we have this concept for organizing data, it is proposed that all data objects should be in some dataset and this not be optional. It makes data discovery and access more complex if we don't have a firmly defined hierarchy.
ReadGroup
hasunion { null, string } datasetId = null;
Is there any use case where a ReadGroups in different ReadGroupSets will be in different DataSets? Suggest removing datasetId.ReadGroupSet
hasunion { null, string } datasetId = null;
. Suggest making this a required field as it is withVariantSet
ReferenceSet
does not have datasetId. Suggest removing this one special case and require ReferenceSet to be part of a dataset.The text was updated successfully, but these errors were encountered: