Synthetic genome data and 1+ Million Genomes Framework services (NBIS - ELIXIR Sweden) #29

mroos · 2024-01-29T10:11:19Z

The use case described in #57 will be used to identify requirements on the interfaces between the VP¹ and the connected resources, on the data objects that are communicated, and the services that process the data. Some focus will be on leveraging solutions offered by B1MG/GDI compatible resources that enable secure and federated queries for genomic/phenotypic information. EJP-RD partners and Swedish actors generating/stewarding rare disease data with genomic components will be consulted to maximise the utility of the use case in validating technical solutions and in supporting onboarding of new EJP-RD resources.

Relevance to EJP-RD and the wider health data community:

Demonstrator with examples of scenarios where a federated approach to queries/analysis would be ideal—including synthetic data that can be used to test and validate technical solutions (this is what we aim to do with our work in EJP-RD)
Showcase on how to prepare resources (such as clinical datastores, biobanks, FEGA, GDI) to be able to connect to and provide services of value to users through federated platforms (where the Virtual Platform of the EJP-RD is an example)

References:

Implementation:

Mapping B1MG Rare Disease use cases and demonstrator to scenarios using EJP-RD’s Virtual Platform
Mapping the EGA metadata model to the EJP-RD resource matdata schema and CARE-SM
Mapping metadata and content level information (data elements, files and relations) that don’t fit into the EJP-RD models) to concepts that can be represented in RDF, such as using FAIR Genomes
Identifying and mapping common services from platforms such as Scout, GPAP and GDI Starter Kit to the FAIR Data Services for EJP-RD supported by concepts emerging from the FAIR Data Train
Configuring a local test bed that implements a selection of the services identified above but minimally:
1. SPARQL for provisioning CARE-SM and genomic data
2. Beacon v2 with genomicVariations endpoint
3. htsget for genomic variantion data (and possibly aligned genome data)
4. GA4GA TESK implementation that supports containerised compute
Executable demonstrator scenarios with corresponding synthetic and mock data services

Intro to 1+MG, B1MG, and GDI, https://framework.onemilliongenomes.eu/about-the-framework ↩

wna-se · 2024-02-05T10:23:25Z

Status update 2024-02-05:
We are currently working on creating a suitable subset of a synthetic dataset hosted in the European Genome-Phenome Archive (EGA) EGA:EGAD00001008392 and corresponding GDI starter-kit service endpoints.

On the technical side, we have set up a local development instance of the FIAB that we will use to test our mapping from the genomic use case document before the data is transferred to the testbed instance.

Issues: We need some guidance on what we need to prepare to add data to the testbed.

wna-se · 2024-03-04T10:18:02Z

SWAT4HCLS updates (programme):

Mon: Tutorials - Contributing to the content training new members of the team in the EJP-RD FDP configuration
Tue: Co-located meeting - Clinical genetics LUMC collaborators to prepare hackathon.
Thu: Hackathon - EJP-RD scenario for federating SPARQL through the FDP
Fri: Co-located meeting - LUMC + ELIXIR Sweden EJP-RD.

Reflections:

For the genomics case and resources like the EGA and GDI, I think that part of the challenge might be a lack of widely established/used models that encompass all of the concepts represented in the common formats/tools for WGS/VCF files.
Where there are suitable concepts in CARE-SM, we will of course use them and set up the transformations (CSV-YAARRRML or otherwise). A simple solution for making sensitive data available for SPARQL queries within a secured environment could be to rely on the GDI approach for federated analyses more generally and wrap the queries in a request to the GA4GH Task Execution Service endpoint.
We might be able to find a way to create a secure SPARQL endpoint that can be exposed on the FDP and internally translate the requests to use the secure infrastructure for executing the queries where sensitive data can be accessed.
Could you perhaps write a few example SPARQL queries that we should be able to run across our mock-FDPs?

wna-se · 2024-03-11T10:00:50Z

CARE-SM for the genomics use case:
It seems that the CARE-SM model implementations has changed quite a lot since the CDE version we used last year. With the new Laboratory measurement module and using the Laboratory Procedure type we would like to relate to one of the subclasses Whole Exome Sequencing and Disease Panel Gene Sequencing as we mentioned below and the sio:has-target should probably relate to something under DNA Sequence in the Anatomic Structure, System, or Substance tree with a value_datatypeset to IRI. Would it be valid to add an optional column called something like model_subclass (as a replacement of processURI) to allow assigning more specific types of Laboratory Procedure?

The IRI could for example be related to a DCAT Dataset and DataService offering access through the GA4GH htsget protocol, https://www.ga4gh.org/news_item/htsget-ga4ghs-streaming-api-is-a-bridge-to-the-future-for-modern-genomic-data-processing/

When it comes to the modules under Genomic assessment, I think that it would make sense to have an input IRI referencing the outputs of omics-related lab measurements (e.g. WES, Panels etc) or some computational
variant analysis process (perhaps a new assessment type?).

Synthetic data:
Reached out to Sergi Beltran (CNAG) regarding the ⁠Rare Disease Synthetic Dataset (EGAD00001008392) and the Rare Disease Use Case from B1MG D4.1 Secure data access roadmap to find contacts who have been involved in creating the dataset/use case or examples of analyses/tools that have relied on the dataset or demonstrated the use case than could be translated to a federated example over the VP.

Future direction:

Follow up on Leon’s example, e.g. create a mock SPARQL query that selects imaging outputs associated with a patient age and retrieves the patient age and corresponding IRI:s to images
Create a mock query (ab)using the existing Lab measurement and Genomic variant implementations to select outputs accepted with a diagnosis and retrieve the related IRI:s to genomic sequences
Add an issue to the CARE-SM implementations repository explaining the query above to assess if an extension of the existing models or creating a new model would be the ideal solution

wna-se · 2024-03-25T10:06:58Z

Update:

Discussion with Sergi Beltran (CNAG) on the Genome-Phenome Analysis Platform (GPAP) and B1MG DEMO | Federated Data Access Rare Diseases Proof of Concept.
Created a GitHub repository (NBISweden/ejprd) to document the configuration of the Swedish local testbed including include GDI Starter Kit Components. Going forward, we should have a discussion about which parts of this would make sense to incorporate in our shared testbed.
Reached out to a Swedish clinical genomics infrastructure to discuss demonstrators.

Future direction:

Analyse the training materials for the RD Connect GPAP platform for examples of queries / analyses that can be run on the ⁠Rare Disease Synthetic Dataset (EGAD00001008392).
Consider translating the B1MG DEMO | Federated Data Access Rare Diseases Proof of Concept to involved EJP-RD Virtual Platform.

wna-se · 2024-04-15T09:07:16Z

Update:

Progressively adding services to a Swedish local testbed configuration that will include GDI Starter Kit components in addition to a EJP-RD compatible FDB, Beacon implementation and SPARQL endpoint for CARE-SM.
Started work on mapping data elements from the metadata model of the Federated EGA and the Rare Disease Synthetic Dataset (EGAD00001008392) more specifically to EJP-RD resource metadata schema and the CARE-SM.

Future direction:

Complement with a mapping to the FAIR Genomes semantic schema
Design SHACL shapes for data from the EGA

wna-se · 2024-04-22T09:06:40Z

Update:

Linked Swedish local testbed configuration to the DistributedAnalysis repo
Started mapping Rare Disease Synthetic Dataset (EGAD00001008392) to the FAIR Genomes semantic schema

wna-se · 2024-04-29T09:33:55Z

Update:

Updated descriptions of this issue and its parent (Federated Genome Interoperability: Analysis on genomic variation to support treatment and research #57) to describe their aims and scope.

Future direction:

Add GDI Beacon endpoint to local testbed

wna-se · 2024-04-30T13:39:57Z

@NuriaQueralt, I’ve made a copy of (a subset of) the files in a private GitHub repository and invited you as a collaborator. Once you have accepted the invitation you can find one of the files here and a narrative description is available here.

Anyone can register an account on EGA and request access to the full dataset if you want a local copy at LUMC. I’ve reached out to the RD Connect Platform team to look into under what conditions the data can redistributed more broadly.

wna-se · 2024-07-01T09:05:06Z

Update:

Local testbed configuration (NBISweden/ejprd) is being deployed to a public cloud service and will be available this week. Some LS-AAI to be resolved or fall-back to mock-service
Use case demonstrator scenarios / questions to answer using the VP has been developed and will be added to the testbed

mroos · 2024-07-22T09:18:18Z

FYI: Alberto shared VP testbed configuration in the Teams chat.

mroos added this to L3-FAIR Data Train issues Jan 15, 2024

mroos assigned wna-se Jan 29, 2024

mroos converted this from a draft issue Jan 29, 2024

This comment was marked as resolved.

Sign in to view

wna-se mentioned this issue Mar 11, 2024

Step 01 - Identify (meta)data requirements from use cases #1

Open

dwijnbergen added the use case step This is a step of a use case label Apr 15, 2024

dwijnbergen mentioned this issue Apr 12, 2024

Federated Genome Interoperability: Analysis on genomic variation to support treatment and research #57

Open

3 tasks

wna-se changed the title ~~Expose Swedish FDPs containing synthetic genome data on the test bed~~ Synthetic genome data & 1+ Million Genomes Framework implementation (NBIS - ELIXIR Sweden) Apr 22, 2024

wna-se changed the title ~~Synthetic genome data & 1+ Million Genomes Framework implementation (NBIS - ELIXIR Sweden)~~ Synthetic genome data and 1+ Million Genomes Framework services (NBIS - ELIXIR Sweden) Apr 22, 2024

This comment was marked as resolved.

Sign in to view

wna-se mentioned this issue May 20, 2024

GDI use case #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic genome data and 1+ Million Genomes Framework services (NBIS - ELIXIR Sweden) #29

Synthetic genome data and 1+ Million Genomes Framework services (NBIS - ELIXIR Sweden) #29

mroos commented Jan 29, 2024 •

edited by wna-se

Loading

This comment was marked as resolved.

wna-se commented Feb 5, 2024 •

edited

Loading

This comment was marked as resolved.

wna-se commented Mar 4, 2024 •

edited

Loading

wna-se commented Mar 11, 2024 •

edited

Loading

wna-se commented Mar 25, 2024 •

edited

Loading

wna-se commented Apr 15, 2024 •

edited

Loading

wna-se commented Apr 22, 2024 •

edited

Loading

wna-se commented Apr 29, 2024 •

edited

Loading

This comment was marked as resolved.

wna-se commented Apr 30, 2024 •

edited

Loading

This comment was marked as resolved.

wna-se commented Jul 1, 2024

mroos commented Jul 22, 2024

Synthetic genome data and 1+ Million Genomes Framework services (NBIS - ELIXIR Sweden) #29

Synthetic genome data and 1+ Million Genomes Framework services (NBIS - ELIXIR Sweden) #29

Comments

mroos commented Jan 29, 2024 • edited by wna-se Loading

Footnotes

This comment was marked as resolved.

wna-se commented Feb 5, 2024 • edited Loading

This comment was marked as resolved.

wna-se commented Mar 4, 2024 • edited Loading

wna-se commented Mar 11, 2024 • edited Loading

wna-se commented Mar 25, 2024 • edited Loading

wna-se commented Apr 15, 2024 • edited Loading

wna-se commented Apr 22, 2024 • edited Loading

wna-se commented Apr 29, 2024 • edited Loading

This comment was marked as resolved.

wna-se commented Apr 30, 2024 • edited Loading

This comment was marked as resolved.

wna-se commented Jul 1, 2024

mroos commented Jul 22, 2024

mroos commented Jan 29, 2024 •

edited by wna-se

Loading

wna-se commented Feb 5, 2024 •

edited

Loading

wna-se commented Mar 4, 2024 •

edited

Loading

wna-se commented Mar 11, 2024 •

edited

Loading

wna-se commented Mar 25, 2024 •

edited

Loading

wna-se commented Apr 15, 2024 •

edited

Loading

wna-se commented Apr 22, 2024 •

edited

Loading

wna-se commented Apr 29, 2024 •

edited

Loading

wna-se commented Apr 30, 2024 •

edited

Loading