Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model expansion to include metadata models for priority data types #74

Closed
Bankso opened this issue Feb 16, 2024 · 4 comments
Closed
Assignees
Labels
effort-high feature New feature or request

Comments

@Bankso
Copy link
Contributor

Bankso commented Feb 16, 2024

The MC2 Center data model does not currently enable assay-specific metadata to be recorded. Adding these models is a critical part of supporting data sharing and reuse. Potential models to add were originally outlined in this spreadsheet

Some models that we should consider implementing:

  • (Multiplexed) imaging and microscopy
  • bulk RNA-Seq
  • single cell RNA-Seq
  • 10x Visium/Xenium
  • Flow Cytometry
  • Mass Spectrometry/Proteomics/Metabolomics
  • Other sequencing based methods (ATAC-Seq, CUT&RUN, ChIP-Seq, etc.)

Some notes on approach:

  • where possible, I think we should use 'depends on' designations, to help define attribute connections and component dependencies. This should allow us to use a single template to cover multiple related data types
  • data model development should include the metadata model itself, but also an SOP-type doc that describes the folder/organizational strategy, ingress/storage considerations, guide for annotating the resources (like what types of files the requested metadata can be found in, suggested order of completion and submission, etc.) that will be included in reference materials
  • while the assay models are critical for reuse, the DatasetView model is completely sufficient for CCKP entries. I think we should exclusively display DatasetView info on the portal and link back to table snapshots or Synapse datasets that contain the assay-specific metadata
@Bankso Bankso self-assigned this Feb 16, 2024
@Bankso Bankso added effort-high feature New feature or request labels Feb 16, 2024
@Bankso
Copy link
Contributor Author

Bankso commented Jul 3, 2024

Related to mc2-center/mc2-center-dcc#71 and mc2-center/mc2-center-dcc#72

Integrate CDS/DataHub models where reasonable. Develop mappings if needed.

@aclayton555
Copy link

24-11/12 Orion to document where we are at by end of sprint and what has been integrated. As needed, split off a new ticket

@Bankso
Copy link
Contributor Author

Bankso commented Dec 12, 2024

Available record models:

  • Information
    • Consortium
    • Grant View
    • Project View
    • Person View
    • Institution
    • Study (Note: pending changes related to CDS V5.0.2/Study mapping)
    • Theme
  • Resources
    • Biospecimen (Note: pending changes related to CDS V5.0.2/Sample, Diagnosis, Participant mapping)
    • Collection (Note: pending changes related to metadata sharing approach)
    • Dataset View
    • Dataset Sharing Plan
    • Educational Resource
    • Individual (Note: pending changes related to CDS V5.0.2/Sample, Diagnosis, Participant mapping)
    • Model (Note: pending changes related to CDS V5.0.2/Sample, Diagnosis, Participant mapping)
    • Publication View
    • Tool View
  • Data
    • GeoMx Auxiliary Files
    • GeoMx ROI Segment Annotation
    • Imaging Channel
    • 10x Visium Auxiliary Files

Available file models:

  • File View
    • GeoMx
      • Level 1 (FASTQ files or unprocessed spot counts, if available; pending update to include essential CDS V5.0.2 Genomic Info elements)
      • Level 2 (DCC or RCC formatted files, derived from Level 1; pending update to include essential CDS V5.0.2 Genomic Info elements)
      • Level 3 (expression matrices [counts per gene target per spot]), Imaging (TIF or OME-TIFF from instrument; pending update to include essential CDS V5.0.2 Image, Multiplexed Microscopy, and Non-DICOM Pathology elements)
    • Imaging
      • Level 1 (unprocessed or raw images)
      • Level 2 (minimally processed and sharable [TIF, OME-TIF, PNG])
      • Level 3 Image (QCd images)
      • Level 3 Segments (Segmentation masks, typically in tabular format)
      • Level 4 (feature array or other data derived from image analysis)
    • Sequencing
      • Level 1 (FASTQs or other minimally processed sequencing read format)
      • Level 2 (BAMs or other genome-aligned file format)
      • Level 3 (Coverage [BigWig, BedGraph, etc.], peaks [BED, CSV, TSV, etc.], or counts [CSV, TSV, etc.] derived from BAMs; Note: could add a Level 4 to contain peak-derived counts, which would be a composite of Level 2 + 3 or Level 3 + 3 files)
      • Sequencing RNA
        • Level 1 (RNA-specific information for FASTQs)
    • Visium RNA (Note: all pending update to include essential CDS V5.0.2 Sequencing, Image, Multiplexed Microscopy, and Non-DICOM Pathology elements)
      • Level 1
      • Level 2
      • Level 3
      • Level 4

@aclayton555
Copy link

24-11/12 - proceed to close. Remaining CDS mapping work includes integration testing pending confirmation that mapping is accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort-high feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants