Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize ancillaryResults and qualityMeasures to all Study Results #250

Open
mbrush opened this issue Jan 16, 2025 · 4 comments
Open

Generalize ancillaryResults and qualityMeasures to all Study Results #250

mbrush opened this issue Jan 16, 2025 · 4 comments
Assignees
Milestone

Comments

@mbrush
Copy link
Contributor

mbrush commented Jan 16, 2025

Per review with @afrubin on MaveDB.

Originally posted by @ahwagner in #234 (comment)

See also Issue #144 relevant to this topic.

@mbrush
Copy link
Contributor Author

mbrush commented Jan 16, 2025

Responses from original post:

from afrubin:

This will be a very useful way to communicate relevant values like experimental variability (variance, std error) without being overly prescriptive. Having similar naming conventions/structures across GKS Study Results will also make it easier for implementers and data consumers.

From mbrush:

Agree with the utility of a way to capture this type of information in a consistent way, and agree that if we stick with this pattern as a way to capture additional data items in Study Results, then it makes sense to include them in the generic core model StudyResult class. However, reserving final judgement as I have some questions about the proposed modeling pattern, and how it relates to use and modeling of Extensions. Finally, If we do keep this pattern, I would make the simple change of renaming ancillaryReuslts to ancillaryData - as this object contains additional data items, not additional Study Results.

@mbrush
Copy link
Contributor Author

mbrush commented Jan 21, 2025

See also Issue #144 relevant to this topic.

@ahwagner ahwagner moved this to Backlog in VA-Spec Jan 22, 2025
@mbrush
Copy link
Contributor Author

mbrush commented Jan 22, 2025

To discuss on this issue:

  1. Clarify the need for these attributes and the object they take - given that we could just define new attributes for the various ancillary results / quality measures directly at the same level as the other attributes that hold data in a CAFStudyResult, e.g.:
- id: gnomad4:1-10120-T-G.sas
  type: CohortAlleleFrequencyStudyResult
  label: South Asian Ancestry Group Allele Frequency for 1-10120-T-G
  sourceDataSet:
    - id: gnomad4.1.0
      type: DataSet
      label: gnomAD v4.1.0
      version: 4.1.0
  focusAllele: ga4gh:VA.XTZZHRCS2lIZuST7_LnLSHwro5uYYtVF
  focusAlleleCount: 0
  locusAlleleCount: 716
  alleleFrequency: 0
  
  # implementation-specific attributes start here
  grpMaxFAF95: {}
  homozygotes: 0
  hemizygotes: 0

But I think the motivation is that we want a way to compartmentalize these attributes to indicate their role in the Result, and highlight them as implementation-specific. So, assuming we keep this approach, . . .

  1. The main question in this ticket concerns whether to define these ancillaryResults and qualityMeasures attributes in the va-core model - per Alan's suggestion above. This would simply require moving their definition into the va-core-source.yaml file. No other changes needed (e.g. no need to 'extend' these core attributes in the CAF Study Result profile)

  2. Regardless of where it lives, can we rename ancillaryResults to ancillaryData - as it contains additional data items, not additional Study Results?

  3. Another small attribute name change in this vein - rename CAF.subCohortFrequency --> `subCohortFrequencyResults' (which makes it clearer that this property does hold Study Result objects)

  4. Finally, note that the test fixtures caf example does not demonstrate ancillaryResults or qualityMeasures attributes . . . is this ok since this they are still 'draft' maturity?

For reference: the gk-pilot implementation CAF profile is here, and their data example is here


A related issue for the CAF profile that I'll note here is is that I would like to include in the va-core model the sepio studyGroup attribute (on which the CAF cohort attribute is based), and similarly include the sepio componentResult attribute (on which the CAF subCohortFrequency attribute is based).

These were previously removed va-core, based on the argument that only one implementation uses them (CAF) - but I think one implementer is sufficient to warrant inclusion in the core model. And think it is helpful for incoming adopters to be able to see these attributes them there. They can remain 'draft' in va core until a second adopter implements, but I see no reason to not include them in the core model.

I can move this question into a separate ticket if it is not quickly resolvable as part of the conversation above.

@mbrush mbrush added this to the VA 1.0 milestone Jan 23, 2025
@mbrush mbrush moved this from Backlog to In Progress in VA-Spec Feb 10, 2025
@mbrush
Copy link
Contributor Author

mbrush commented Feb 12, 2025

Outcomes of Discussion on 2-11-25:

  • we will move ancillaryResults and qualityMeasures attributes into the va-core model, given that other implementations have suggested the utility of these features.
  • we will not rename these attributes at present - it was clarified that "results" in ancillaryResults indicates that the data collected here are considered broader information derived form the more fundamental data captured in the main body of the StudyResult (e.g. a grpMaxFAF95 calculation, or homozygote/heterozygote calls derived from analyzing raw allele count data).
  • as draft attributes, these do not need to be exemplified in the test fixture data examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Review
Development

No branches or pull requests

2 participants