Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: clean_prefix_map() not detecting prefix synonyms #485

Closed
wants to merge 6 commits into from

Conversation

joeflack4
Copy link
Collaborator

@joeflack4 joeflack4 commented Jan 13, 2024

Overview

This PR originally had more changes in order to provide fixes for monarch-initiative/mondo-ingest#394. However it is no longer necessary to merge this in order for that PR to be merged.

This PR is now just a bugfix in clean_prefix_map() in which the converter object contains prefix synonyms.


@twhetzel FYI for your review!
@souzadevinicius A lot going on here, but FYI. This is related to my mondo ingest pr.

@joeflack4 joeflack4 added the bug Something isn't working label Jan 13, 2024
@joeflack4 joeflack4 self-assigned this Jan 13, 2024
src/sssom/io.py Outdated Show resolved Hide resolved
src/sssom/util.py Outdated Show resolved Hide resolved
src/sssom/io.py Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the epm-updates branch 2 times, most recently from 79d8716 to 157cef4 Compare January 14, 2024 00:36
cthoyt added a commit to biopragmatics/curies that referenced this pull request Jan 15, 2024
src/sssom/io.py Outdated Show resolved Hide resolved
- Bugfix: clean_prefix_map(): Was ignoring prefix aliases. This was causing an error, as there was a perceived mismatch between the prefixes of the mapping set, and the prefix_map.
- Bugfix: get_metadata_and_prefix_map(): Was not utilizing extended prefix maps. This manifested in issue where prefix aliases were not incorporated. This meant that (a) if we tried to fix by removing the alias from the plain prefix_map, these CURIEs could not be resolved, (b) if we included the alias in the plain prefix_map, there would be a duplicate URI prefix, which would result in an error.
- Add: convert_plain_prefix_map_to_extended()
cthoyt added a commit to biopragmatics/curies that referenced this pull request Jan 16, 2024
Implement function to deterministically make an EPM from a
non-bijective prefix map, motivated by mapping-commons/sssom-py#485
- Delete: convert_plain_prefix_map_to_extended()
- Update: get_metadata_and_prefix_map(): Reverted changes several commits ago utilizing convert_plain_prefix_map_to_extended()
@joeflack4 joeflack4 changed the title Extended prefix maps usage updates Bugfix: clean_prefix_map() not detecting prefix synonyms Jan 20, 2024
This is a small test that illustrates why we cant just use `include_synonyms=True` to avoid the "missing prefix error" here https://github.com/mapping-commons/sssom-py/pull/485/files#diff-9f0c9a59dd545560607697feafaa62deb84dfd657f868fbeed9275e2acc0d9a6R249
@@ -246,7 +246,7 @@ def clean_prefix_map(self, strict: bool = True) -> None:
if self.metadata:
prefixes_in_table.update(get_prefixes_used_in_metadata(self.metadata))

missing_prefixes = prefixes_in_table - self.converter.get_prefixes()
missing_prefixes = prefixes_in_table - self.converter.get_prefixes(include_synonyms=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i am wrong. I added a test below that illustrates the issue.

Lets say the raw data (prefixes in table) contains snomed. The converter has SCTID as a prefix. snomed as a synonym. Now this line of code here will not include snomed in missing prefixes, and move on.

Now you go a few lines below to extract a subconverter over all the prefixes in the table, including snomed, which will now become the main converter (this part is not very relevant for the problem but bear with me). Now, when the prefixmap gets serialsed to a valid SSSOM file, which uses msdf.converter.bimap (see write_table() method). This map does not contain snomed. so it will write SCTID, while the data (the mappings) still contains snomed.

If you still think this PR is right, you need to convince me that giving a mapping set with a value containing snomed, and snomed being a synonym in the EPM, the serialised SSSOM files contains snomed prefix..

Copy link
Collaborator Author

@joeflack4 joeflack4 Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, I see the problem. I think you touched on this at the last meeting. How do you feel about serializing a non-bijective prefix map which would include the synonyms, then?

When it comes to working with synonyms in SSSOM, do you think this we've covered all of the cases? Or are there likely to be more places where synonym coverage is needed?

I don't see an issue on the topic of "EPM" / "extended prefix map" or "synonyms". Should we close this for now and move the discussion to a new issue?


converter = Converter.from_extended_prefix_map(EPM)
self.assertNotIn("snomed", converter.get_prefixes(include_synonyms=False))
self.assertIn("snomed", converter.get_prefixes(include_synonyms=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not SSSOM-py's job to test curies functionality. The test should show how the change in this PR actually effects input and output to SSSOM-py functions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a test, it's some code to show why the pr the way it is should not be merged.. I don't intend to add this as a test here

@matentzn
Copy link
Collaborator

matentzn commented Feb 4, 2024

@joeflack4 - I think this PR should be closed, unless you think there is something we should salvage.

@matentzn matentzn marked this pull request as draft February 4, 2024 10:40
@joeflack4 joeflack4 marked this pull request as ready for review February 4, 2024 21:35
@joeflack4
Copy link
Collaborator Author

@matentzn This PR is now a 1 line change:
missing_prefixes = prefixes_in_table - self.converter.get_prefixes(include_synonyms=True)

And it would seem it is a bugfix. It seems like this is something that should be merged.

I suppose I can pick up where Charlie left off and make sure that the test passes. I haven't looked at it yet.

@matentzn
Copy link
Collaborator

matentzn commented Feb 5, 2024

I thought I had shown that there is a problem with this approach - check my comments and the "test" that illustrates the issue. Maybe my assessment is wrong, but then you have to proof that my test is wrong 😁

@joeflack4
Copy link
Collaborator Author

@matentzn Ah I'm sorry you're right, I see your last discussion.

@joeflack4 joeflack4 closed this Feb 5, 2024
@joeflack4
Copy link
Collaborator Author

joeflack4 commented Feb 5, 2024

@matentzn I closed it but we have something unresolved that I mentioned the last time:

I don't see an issue on the topic of "EPM" / "extended prefix map" or "synonyms". Should we close this for now and move the discussion to a new issue?

See:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants