Define specific queries on specific CDEs modelled by Care-SM #28

mroos · 2024-01-15T11:00:36Z

Define SPARQL queries that can be used to answer the information needs described in the use case flash card and the mindmap linked from #57 by relying on the information models defined in the Virtual Platform Specification (VIPS)¹ with extensions only where necessary.

List of models from VIPS used (add as necessary):

EJP RD meta data model – findability of rare disease resources
Clinical And Registry Entries (CARE) Semantic Model – core data standard describing common data elements essential for RD research

List of models not in VIPS used (add as necessary):

FAIR Genomes metadata schema – semantic metadata schema to power reuse of NGS data

Queries to implement:

Query 1: …
Query 2: …

See VIPS 2.0, page 17 ↩

mroos · 2024-01-15T11:03:03Z

This issue is to define a specific case that we can perform, using a minimal number of data elements including a minimal number of CDEs such to demonstrate the use of Care-SM in queries.

Special request

Can we do a rare disease and oncology case in parallel: this will help adoption at the local institutes? Marco will bring this up with Karolis for the LCCO project.

wna-se · 2024-03-05T13:27:50Z

It would be very useful to have a realistic and compelling query that includes the CARE-SM data elements related to Clinical measurements and Genetic assessment with a description on the sizes and characteristics of the datasets it would be run on and the expected results.

Perhaps something based on B1MG D4.1 - Secure cross-border data access roadmap - 1v0:

andrawaag · 2024-05-03T15:06:38Z

During biohackathon on Mai 3rd @wna-se and @andrawaag made this a place to collect query example to be transformed into SPARQL

wna-se · 2024-06-03T09:36:21Z

@markwilkinson Added you here as discussed during today’s meeting. Please link to / add any queries that you can share here and / or to the ejp-rd-vp/DistributedAnalysis repository.

wna-se · 2024-06-03T09:38:26Z

@NuriaQueralt For those working on Phenopackets and/or phenotypic data that could be present in case reports, the following datasets consisting of published case report translated to Phenopackets may be useful?

Peter N Robinson. (2021). Phenopackets for case reports of structural variants (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5071267
Peter N Robinson. (2020). 384 Phenopackets (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3905420

Also, the JSON Schema validator from the phenopackets / phenopacket-tools could perhaps be a useful resource to inspire the RDF/Shacl-mapped version. Notably, the folder with the JSON Schema gives an example of how they have mirrored the structure of the authoritative protobuf definitions, the validation rules could probably be translated into Shacl, and the the choice of uris used to reference the definitions could also be useful.

Is there an issue specifically for the Phenopacket work? Perhaps also relevant to @rosazwart ?

wna-se · 2024-06-03T11:12:33Z

@andrawaag : @mroos said that you would be a great person to take the lead on this task. As we are working on preparing the synthetic data we have for the VP it would be great to have some examples of genomic-related that we could prioritise mapping to, ideally a few queries based on the @mroos’ mindmap (see reference in the description of #57) and using the CARE-SM and FAIR Genomes semantic schema.

Edit: @ericprud : I’m also tagging you here as discussed during today’s meeting. It would be very helpful with some exemples

mroos · 2024-06-10T09:23:52Z

Update 10/6
@hbcesar, Annika, @andrawaag working on example queries.
@pabloalarconm asked to provide example data for the queries (CSV + conversion method).

@ericprud asks to share resulting RDF into github for others in this group to use (or repo of choosing) @pabloalarconm

@andrawaag : may need an intermediate step first (chicken-egg). Need to find a way to get to the RDF, whereas Wolmar (and colleages) need help on converting from data that works for Beacon.

ejp-rd-vp/DistributedAnalysis#28 (comment) These queries are for scenario 1 question 1

wna-se · 2024-06-14T15:56:05Z

@NuriaQueralt For those working on Phenopackets and/or phenotypic data that could be present in case reports, the following datasets consisting of published case report translated to Phenopackets may be useful?

Peter N Robinson. (2021). Phenopackets for case reports of structural variants (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5071267

Peter N Robinson. (2020). 384 Phenopackets (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3905420

Also, the JSON Schema validator from the phenopackets / phenopacket-tools could perhaps be a useful resource to inspire the RDF/Shacl-mapped version. Notably, the folder with the JSON Schema gives an example of how they have mirrored the structure of the authoritative protobuf definitions, the validation rules could probably be translated into Shacl, and the the choice of uris used to reference the definitions could also be useful.

Is there an issue specifically for the Phenopacket work? Perhaps also relevant to @rosazwart ?

@andrawaag Above are two references to collections of Phenopackets that represent published case reports and could be useful source materials to produce a realistic graph to query using @NuriaQueralt and @rosazwart mapping. The synthetic data that we have been working on in Sweden is a subset of files derived from the Rare Disease Synthetic Dataset available in full from the European Genome-Phenome Archive (EGA) through accession number EGAD00001008392, see example phenopacket, PDF describing the data and the full subset as well as derived files in NBISweden/ejprd-data/ .

wna-se · 2024-06-17T09:42:54Z

The CARE-SM/beaconAPI4CARESM also contains some SPARQL templates that can be used to serve a Beacon endpoint.

pabloalarconm · 2024-06-17T10:30:42Z

Hi @wna-se @mroos

Some of these tasks are tagging me in this conversation but its not clear what you need. As the main maintener of CARE-SM nowadays, what is exactly what you need from my contribution of your use case? (Probably you discussed in a meeting Im not involved to)

ShEx files for schema validation are already included at here
SPARQL queries have been always here There's two examples, but let me know if you need more cases to add here. SPARQL queries fragments from beaconAPI4CARESM are just fragments, hard to reuse in a first attempt but let me know if you need help with that (I can connect to a meeting to discuss its implementation out of this API)
I will add examplar RDF data to the CARE-SM implementation repo. DO you need to for every specific data element? Or a single example representation?

Bests,
Pablo

wna-se · 2024-07-01T09:57:16Z

Pasted from e-mail by @NuriaQueralt on 20 June:

Dear all,

I have finished the phenopackets RDF model, in ShEx. You can have a look in github, in branch “”v2”. I modelled ONLY the elements required for the GDI use case. Rosa, I also modelled the Variant related elements for the LUMC data, so you can start adapting your RDFization pipeline.

Good news! We have a bunch of phenopackets that follow the current scheme version here: https://monarch-initiative.github.io/phenopacket-store/ I suggest to use these set for our POC. I may refine the model adding some RDF examples using these data, so I may do some changes to the model.

My apologies, I cannot make it to today-s meeting due to a clash in my agenda.

With kind regards,
Núria

mroos · 2024-07-22T09:35:09Z

@markwilkinson among others will have to do this anyway

ericprud · 2024-08-02T19:11:09Z

I will add examplar RDF data to the CARE-SM implementation repo. DO you need to for every specific data element? Or a single example representation?

Ideally, we'd have a couple nice examples that demonstrated the breadth of the expressions. This will serve as documentation and inspiration for schema and queries. Such examples could be cobbled together from multiple instances of the current JSON data.

Having all the data would also be handy as it would help us verify schema and queries and provide a corpus for tests. Would also be nice for demos.

mroos added this to L3-FAIR Data Train issues Jan 15, 2024

mroos assigned andrawaag, mroos and NuriaQueralt Jan 15, 2024

mroos converted this from a draft issue Jan 15, 2024

mroos moved this from Backlog to Ready in L3-FAIR Data Train issues Jan 29, 2024

wna-se mentioned this issue May 3, 2024

Federated Genome Interoperability: Analysis on genomic variation to support treatment and research #57

Open

3 tasks

wna-se assigned markwilkinson Jun 3, 2024

wna-se assigned ericprud Jun 3, 2024

mroos moved this from Ready to In progress in L3-FAIR Data Train issues Jun 10, 2024

andrawaag added a commit to ejp-rd-vp/DistributedAnalysisDemonstrator that referenced this issue Jun 10, 2024

example pseudo-SPARQL queries as expressed in:

8862ead

ejp-rd-vp/DistributedAnalysis#28 (comment) These queries are for scenario 1 question 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define specific queries on specific CDEs modelled by Care-SM #28

Define specific queries on specific CDEs modelled by Care-SM #28

mroos commented Jan 15, 2024 •

edited by wna-se

Loading

mroos commented Jan 15, 2024 •

edited

Loading

wna-se commented Mar 5, 2024 •

edited

Loading

andrawaag commented May 3, 2024

wna-se commented Jun 3, 2024

wna-se commented Jun 3, 2024 •

edited

Loading

wna-se commented Jun 3, 2024 •

edited

Loading

mroos commented Jun 10, 2024 •

edited

Loading

wna-se commented Jun 14, 2024

wna-se commented Jun 17, 2024

pabloalarconm commented Jun 17, 2024

wna-se commented Jul 1, 2024

mroos commented Jul 22, 2024 •

edited

Loading

ericprud commented Aug 2, 2024

Define specific queries on specific CDEs modelled by Care-SM #28

Define specific queries on specific CDEs modelled by Care-SM #28

Comments

mroos commented Jan 15, 2024 • edited by wna-se Loading

Footnotes

mroos commented Jan 15, 2024 • edited Loading

wna-se commented Mar 5, 2024 • edited Loading

andrawaag commented May 3, 2024

wna-se commented Jun 3, 2024

wna-se commented Jun 3, 2024 • edited Loading

wna-se commented Jun 3, 2024 • edited Loading

mroos commented Jun 10, 2024 • edited Loading

wna-se commented Jun 14, 2024

wna-se commented Jun 17, 2024

pabloalarconm commented Jun 17, 2024

wna-se commented Jul 1, 2024

mroos commented Jul 22, 2024 • edited Loading

ericprud commented Aug 2, 2024

mroos commented Jan 15, 2024 •

edited by wna-se

Loading

mroos commented Jan 15, 2024 •

edited

Loading

wna-se commented Mar 5, 2024 •

edited

Loading

wna-se commented Jun 3, 2024 •

edited

Loading

wna-se commented Jun 3, 2024 •

edited

Loading

mroos commented Jun 10, 2024 •

edited

Loading

mroos commented Jul 22, 2024 •

edited

Loading