Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New metadata element]: MappingSet.curie_map #225

Closed
joeflack4 opened this issue Aug 29, 2022 · 12 comments · Fixed by #376
Closed

[New metadata element]: MappingSet.curie_map #225

joeflack4 opened this issue Aug 29, 2022 · 12 comments · Fixed by #376
Assignees
Milestone

Comments

@joeflack4
Copy link
Contributor

Element id (e.g. creator_id, mapping_tool_version):
curie_map

curie_map:
    OMIM: http://www.omim.org/something/something

Value data type (e.g. URI, URL, text, xsd:boolean):

map

Description
A map of CURIEs. CURIE maps already exist in the SSSOM metadata and many files that are currently in use, so strange for it not to be in the spec. It is also commonly called prefix_map.

@jamesamcl
Copy link
Member

jamesamcl commented May 12, 2023

Would it be mandatory for all CURIE prefixes to be in the CURIE map?

We have been using Bioregistry (tagging @cthoyt) to map "curies" (or at least prefix:localPart database IDs) to URLs, which are not always IRIs and therefore sometimes have complex formats e.g. Chemspider:4481878 to http://www.chemspider.com/Chemical-Structure.4481878.html.

Because these are not prefixes there's nothing I can put in the prefix map. But I still want to be able to put Chemspider:4481878 in the mappings table even if Chemspider can't be in the CURIE map.

Also @matentzn there are a lot examples in the OLS extraction like this, and in many cases it's valuable data so I don't think we should omit.

@cthoyt
Copy link
Member

cthoyt commented May 13, 2023

I think there are a few issues conflated in your comment @udp

inside SSSOM, everything is represnted by CURIEs, so if you have a CURIE appearing anywhere in an SSSOM document with a prefix that's not in the prefix map, then your document is not semantically sound (for lack of a correct technical word)

the issue of URLs vs IRIs doesn't really matter since for all semantic web stuff, you don't actually care if something is a URL

when it comes to CURIEs with weird URI format strings that don't end in the $1, one solution is to just use the bioregistry URL, which does behave nicely

@jamesamcl
Copy link
Member

@cthoyt thanks, this makes sense; I think one of the problems is that bioregistry doesn't differentiate between IRIs and URLs. If it was clear that a mapping was to an URL I would use the bioregistry prefix for the IRI instead.

@cthoyt
Copy link
Member

cthoyt commented May 14, 2023

Can you elaborate a bit more? I’m not quite sure i understand what your needs are with differentiating between URL and IRIs. This probably is related to some other requests for making PURLs more carefully annotated. We can move discussion onto slack or a
Different issue if this getting off topic

@matentzn
Copy link
Collaborator

Everything is slightly off topic here, can we open https://github.com/mapping-commons/sssom/discussions for all unanswered questions and I will answer them?

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

The big question at the moment is if this can be represented in LinkML by requesting a new primitive type (Dict[String,String]) or some other means. I reached out to @cmungall to hear his position on this.

@cmungall
Copy link
Contributor

cmungall commented Feb 5, 2024 via email

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

cmungall added a commit to linkml/linkml that referenced this issue Feb 5, 2024
@gouttegd
Copy link
Contributor

gouttegd commented Feb 5, 2024

I think I got it. Basically it would be something like:

prefix_name:
  key: true
  range: ncname

prefix_url:
  range: uri

prefix:
  slots:
    - prefix_name
    - prefix_url

curie_map:
  range: prefix
  multivalued: true
  inlined: true

The key part (pun intended) being the key field in the declaration for the prefix_name slot. This is what indicates than the curie map should be serialised as

curie_map:
  <prefix_name>: <prefix_url>

rather than

curie_map:
  - prefix_name: <prefix_name>
    prefix_url: <prefix_url>

@cthoyt
Copy link
Member

cthoyt commented Feb 5, 2024

Extended prefix maps already fit the bill for this!

https://curies.readthedocs.io/en/latest/struct.html#extended-prefix-maps

Let's reuse that rather than defining a new ad-hoc thing

is there a way we can make a resuable linkml model inside the curies package that can be reused everywhere?

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

Let's reuse that rather than defining a new ad-hoc thing

This is not inventing anything new, just trying to add the existing curie_map to the model. No changes at all.

is there a way we can make a resuable linkml model inside the curies package that can be reused everywhere?

We should 100% add a small model for EPMs! But this is not related to this issue here, which is only to define what is already there.

cmungall added a commit to linkml/linkml that referenced this issue Feb 5, 2024
* Extends jsonschemagen's interpretation of SimpleDicts beyond tuples.

This is necessary for parsing of SimpleDict form of annotations in schemas
(the canonical way to do this is as a SimpleDict)

See linkml/generators/jsonschemagen.py

* owlgen: fixed handling of has_member and all_member

* Added tests for SimpleDict inlining, and for has_member/all_member

* Adding docs for mapping-commons/sssom#225

* formatted

* regenerating snapshot to account for relaxing conditions under which something is a SimpleDict

* error message now changes with relaxed SimpleDict

* regenerate-snapshots
@matentzn
Copy link
Collaborator

matentzn commented Feb 6, 2024

@gouttegd this needs works in terms of docs etc, but I implemented what you suggested:

#349

@matentzn matentzn added this to the 1.0.0 milestone Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants