-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: clean_prefix_map()
not detecting prefix synonyms
#485
Conversation
695ecab
to
b4843e1
Compare
79d8716
to
157cef4
Compare
- Bugfix: clean_prefix_map(): Was ignoring prefix aliases. This was causing an error, as there was a perceived mismatch between the prefixes of the mapping set, and the prefix_map. - Bugfix: get_metadata_and_prefix_map(): Was not utilizing extended prefix maps. This manifested in issue where prefix aliases were not incorporated. This meant that (a) if we tried to fix by removing the alias from the plain prefix_map, these CURIEs could not be resolved, (b) if we included the alias in the plain prefix_map, there would be a duplicate URI prefix, which would result in an error. - Add: convert_plain_prefix_map_to_extended()
157cef4
to
764dc1e
Compare
Implement function to deterministically make an EPM from a non-bijective prefix map, motivated by mapping-commons/sssom-py#485
- Delete: convert_plain_prefix_map_to_extended() - Update: get_metadata_and_prefix_map(): Reverted changes several commits ago utilizing convert_plain_prefix_map_to_extended()
clean_prefix_map()
not detecting prefix synonyms
This is a small test that illustrates why we cant just use `include_synonyms=True` to avoid the "missing prefix error" here https://github.com/mapping-commons/sssom-py/pull/485/files#diff-9f0c9a59dd545560607697feafaa62deb84dfd657f868fbeed9275e2acc0d9a6R249
@@ -246,7 +246,7 @@ def clean_prefix_map(self, strict: bool = True) -> None: | |||
if self.metadata: | |||
prefixes_in_table.update(get_prefixes_used_in_metadata(self.metadata)) | |||
|
|||
missing_prefixes = prefixes_in_table - self.converter.get_prefixes() | |||
missing_prefixes = prefixes_in_table - self.converter.get_prefixes(include_synonyms=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe i am wrong. I added a test below that illustrates the issue.
Lets say the raw data (prefixes in table) contains snomed
. The converter has SCTID
as a prefix. snomed
as a synonym. Now this line of code here will not include snomed
in missing prefixes, and move on.
Now you go a few lines below to extract a subconverter over all the prefixes in the table, including snomed
, which will now become the main converter (this part is not very relevant for the problem but bear with me). Now, when the prefixmap gets serialsed to a valid SSSOM file, which uses msdf.converter.bimap
(see write_table() method). This map does not contain snomed
. so it will write SCTID, while the data (the mappings) still contains snomed
.
If you still think this PR is right, you need to convince me that giving a mapping set with a value containing snomed
, and snomed
being a synonym in the EPM, the serialised SSSOM files contains snomed
prefix..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm, I see the problem. I think you touched on this at the last meeting. How do you feel about serializing a non-bijective prefix map which would include the synonyms, then?
When it comes to working with synonyms in SSSOM, do you think this we've covered all of the cases? Or are there likely to be more places where synonym coverage is needed?
I don't see an issue on the topic of "EPM" / "extended prefix map" or "synonyms". Should we close this for now and move the discussion to a new issue?
|
||
converter = Converter.from_extended_prefix_map(EPM) | ||
self.assertNotIn("snomed", converter.get_prefixes(include_synonyms=False)) | ||
self.assertIn("snomed", converter.get_prefixes(include_synonyms=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not SSSOM-py's job to test curies
functionality. The test should show how the change in this PR actually effects input and output to SSSOM-py functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a test, it's some code to show why the pr the way it is should not be merged.. I don't intend to add this as a test here
@joeflack4 - I think this PR should be closed, unless you think there is something we should salvage. |
@matentzn This PR is now a 1 line change: And it would seem it is a bugfix. It seems like this is something that should be merged. I suppose I can pick up where Charlie left off and make sure that the test passes. I haven't looked at it yet. |
I thought I had shown that there is a problem with this approach - check my comments and the "test" that illustrates the issue. Maybe my assessment is wrong, but then you have to proof that my test is wrong 😁 |
@matentzn Ah I'm sorry you're right, I see your last discussion. |
@matentzn I closed it but we have something unresolved that I mentioned the last time:
See: |
Overview
This PR originally had more changes in order to provide fixes for monarch-initiative/mondo-ingest#394. However it is no longer necessary to merge this in order for that PR to be merged.
This PR is now just a bugfix in
clean_prefix_map()
in which the converter object contains prefix synonyms.@twhetzel FYI for your review!
@souzadevinicius A lot going on here, but FYI. This is related to my mondo ingest pr.