-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define specific queries on specific CDEs modelled by Care-SM #28
Comments
This issue is to define a specific case that we can perform, using a minimal number of data elements including a minimal number of CDEs such to demonstrate the use of Care-SM in queries. Special request
|
It would be very useful to have a realistic and compelling query that includes the CARE-SM data elements related to Clinical measurements and Genetic assessment with a description on the sizes and characteristics of the datasets it would be run on and the expected results. Perhaps something based on B1MG D4.1 - Secure cross-border data access roadmap - 1v0: |
During biohackathon on Mai 3rd @wna-se and @andrawaag made this a place to collect query example to be transformed into SPARQL |
@markwilkinson Added you here as discussed during today’s meeting. Please link to / add any queries that you can share here and / or to the ejp-rd-vp/DistributedAnalysis repository. |
@NuriaQueralt For those working on Phenopackets and/or phenotypic data that could be present in case reports, the following datasets consisting of published case report translated to Phenopackets may be useful?
Also, the JSON Schema validator from the phenopackets / phenopacket-tools could perhaps be a useful resource to inspire the RDF/Shacl-mapped version. Notably, the folder with the JSON Schema gives an example of how they have mirrored the structure of the authoritative protobuf definitions, the validation rules could probably be translated into Shacl, and the the choice of uris used to reference the definitions could also be useful. Is there an issue specifically for the Phenopacket work? Perhaps also relevant to @rosazwart ? |
@andrawaag : @mroos said that you would be a great person to take the lead on this task. As we are working on preparing the synthetic data we have for the VP it would be great to have some examples of genomic-related that we could prioritise mapping to, ideally a few queries based on the @mroos’ mindmap (see reference in the description of #57) and using the CARE-SM and FAIR Genomes semantic schema. Edit: @ericprud : I’m also tagging you here as discussed during today’s meeting. It would be very helpful with some exemples |
Update 10/6
@andrawaag : may need an intermediate step first (chicken-egg). Need to find a way to get to the RDF, whereas Wolmar (and colleages) need help on converting from data that works for Beacon. |
ejp-rd-vp/DistributedAnalysis#28 (comment) These queries are for scenario 1 question 1
@andrawaag Above are two references to collections of Phenopackets that represent published case reports and could be useful source materials to produce a realistic graph to query using @NuriaQueralt and @rosazwart mapping. The synthetic data that we have been working on in Sweden is a subset of files derived from the Rare Disease Synthetic Dataset available in full from the European Genome-Phenome Archive (EGA) through accession number EGAD00001008392, see example phenopacket, PDF describing the data and the full subset as well as derived files in NBISweden/ejprd-data/ . |
The CARE-SM/beaconAPI4CARESM also contains some SPARQL templates that can be used to serve a Beacon endpoint. |
Some of these tasks are tagging me in this conversation but its not clear what you need. As the main maintener of CARE-SM nowadays, what is exactly what you need from my contribution of your use case? (Probably you discussed in a meeting Im not involved to)
Bests, |
Pasted from e-mail by @NuriaQueralt on 20 June:
|
|
Ideally, we'd have a couple nice examples that demonstrated the breadth of the expressions. This will serve as documentation and inspiration for schema and queries. Such examples could be cobbled together from multiple instances of the current JSON data. Having all the data would also be handy as it would help us verify schema and queries and provide a corpus for tests. Would also be nice for demos. |
Define SPARQL queries that can be used to answer the information needs described in the use case flash card and the mindmap linked from #57 by relying on the information models defined in the Virtual Platform Specification (VIPS)1 with extensions only where necessary.
List of models from VIPS used (add as necessary):
List of models not in VIPS used (add as necessary):
Queries to implement:
Footnotes
See VIPS 2.0, page 17 ↩
The text was updated successfully, but these errors were encountered: