Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading (non-bijective) prefix maps #99

Merged
merged 6 commits into from
Jan 16, 2024
Merged

Upgrading (non-bijective) prefix maps #99

merged 6 commits into from
Jan 16, 2024

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Jan 15, 2024

Motivated by mapping-commons/sssom-py#485. Demo:

from curies import Converter, upgrade_prefix_map
pm = {"a": "https://example.com/a/", "b": "https://example.com/a/"}
records = upgrade_prefix_map(pm)
converter = Converter(records)

>>> converter.expand("a:1")
'https://example.com/a/1'

>>> converter.expand("b:1")
'https://example.com/a/1'

>>> converter.compress("https://example.com/a/1")
'a:1'

This function is for people who are not in the position to make the sustainable fix, and want to automate
the assignment of which is the preferred prefix. It uses a deterministic algorithm to choose from two or more
CURIE prefixes that have the same URI prefix and generate an extended prefix map in which they have bene collapsed
into a single record. More specitically, the algorithm is based on a case-sensitive lexical sort of the prefixes.
The first in the sort order becomes the primary prefix and the others become synonyms in the resulting record.


  • Implement function to deterministically make an EPM from a non-bijective prefix map
  • Pick a better name for this function
  • Implement first unit test
  • Think through this harder and implement more unit tests
  • Improve the function's docs
    • Add params/returns
    • Add examples inside the function
  • Add tutorial that gives context to this function (e.g., why do people get in the situation where they have non-bijective prefix maps, what are the consequences of making an automated approach to solving data issues vs. actually solving the data issue upstream).

cc @joeflack4

@cthoyt cthoyt marked this pull request as ready for review January 16, 2024 09:40
@cthoyt cthoyt changed the title Reconciling non-bijective prefix maps Upgrading non-bijective prefix maps Jan 16, 2024
@cthoyt cthoyt changed the title Upgrading non-bijective prefix maps Upgrading (non-bijective) prefix maps Jan 16, 2024
@cthoyt cthoyt enabled auto-merge (squash) January 16, 2024 09:43
@cthoyt cthoyt merged commit faef0bd into main Jan 16, 2024
8 checks passed
@cthoyt cthoyt deleted the handle-bad-prefix-map branch January 16, 2024 09:47


def upgrade_prefix_map(prefix_map: Mapping[str, str]) -> List[Record]:
"""Convert a (potentially problematic) prefix map (i.e., not bijective) into a list of records.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how thorough your docstrings are.

@@ -2191,3 +2194,85 @@ def _get_shacl_line(prefix: str, uri_prefix: str, pattern: Optional[str] = None)
pattern = pattern.replace("\\", "\\\\")
line += f'; sh:pattern "{pattern}"'
return line + " ]"


def upgrade_prefix_map(prefix_map: Mapping[str, str]) -> List[Record]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm sure your test passes as well, but I ran it locally on my own and worked for me as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants