Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestWrite.test_missing_entries() #533

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions tests/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import unittest
from typing import Any, Dict

import curies
import pandas as pd
from curies import Converter

Expand Down Expand Up @@ -191,3 +192,68 @@ def test_write_sssom_ontoportal_json(self):
self.mapping_count,
f"{path} has the wrong number of mappings.",
)

# TODO: Determine assertions that should pass
# TODO: Fix failing assertions that should pass
# TODO: before merging, remove pass/fail comments at end of assertion lines
def test_missing_entries(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get assertions to pass

  • 1. Determine assertions that should pass @matentzn
  • 2. Fix failing assertions that should pass
  • 3. before merging, remove pass/fail comments at end of assertion lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which assertions are appropriate?

Can you look at the assertions I've made and determine which ones you think should be expected to pass, and which should not?

I've left temporary comments next to each assertion showing their current pass/fail status.

"""Test to make sure that missing entires do not appear."""
tmp_file = os.path.join(test_out_dir, "test_missing_entries.sssom.tsv")
df = pd.DataFrame(
[
{
"subject_id": "Orphanet:58",
"subject_label": "Alexander disease",
"predicate_id": "skos:exactMatch",
"object_id": "icd11.foundation:2023359698",
"object_label": "Alexander disease",
"mapping_justification": "semapv:UnspecifiedMatching",
}
]
)
metadata = {
"creator_id": "orcid:0000-0002-2906-7319",
"curie_map": {
"MONDO": "http://purl.obolibrary.org/obo/MONDO_",
"Orphanet": "http://www.orpha.net/ORDO/Orphanet_",
"icd11.foundation": "http://id.who.int/icd/entity/",
"oboInOwl": "http://www.geneontology.org/formats/oboInOwl#",
"orcid": "https://orcid.org/",
"owl": "http://www.w3.org/2002/07/owl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"semapv": "https://w3id.org/semapv/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"sssom": "https://w3id.org/sssom/",
},
"license": "http://w3id.org/sssom/license/unspecified",
"mapping_provider": "https://www.orpha.net/",
}

# When passing Converter
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: 2 different setups

I have a set of assertions for 2 cases of MappingSetDataFrame instantiation:

  • with inclusion of a Converter
  • with no inclusion of a Converter

msdf: MappingSetDataFrame = MappingSetDataFrame(
converter=curies.Converter.from_prefix_map(metadata["curie_map"]),
df=df,
metadata=metadata,
)
self.assertIn("curie_map", msdf.metadata) # pass
self.assertIn("icd11.foundation", msdf.metadata["curie_map"]) # pass
self.assertIn("icd11.foundation", msdf.prefix_map) # pass
with open(tmp_file, "w") as f:
write_table(msdf, f)
msdf2 = parse_sssom_table(tmp_file)
self.assertIn("curie_map", msdf2.metadata) # fail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is expected for now, because curie_map sort if lives outside the metadata. however, it will soon be solved: mapping-commons/sssom#349

Copy link
Collaborator Author

@joeflack4 joeflack4 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I'm surprised it wasn't part of the data model.

From an implementation standpoint, I was thinking that maybe msdf.prefix_map and msdf.metadata['curie_map'] could point to the same Python dictionary object. Right now they are different objects and, as these assertions show, can get out of sync.

self.assertIn("icd11.foundation", msdf2.metadata["curie_map"]) # KeyError
self.assertIn("icd11.foundation", msdf2.prefix_map) # pass

# When no passed Converter
msdf: MappingSetDataFrame = MappingSetDataFrame(df=df, metadata=metadata)
self.assertIn("curie_map", msdf.metadata) # pass
self.assertIn("icd11.foundation", msdf.metadata["curie_map"]) # pass
self.assertIn("icd11.foundation", msdf.prefix_map) # fail
with open(tmp_file, "w") as f:
write_table(msdf, f)
msdf2 = parse_sssom_table(tmp_file)
self.assertIn("curie_map", msdf2.metadata) # fail
self.assertIn("icd11.foundation", msdf2.metadata["curie_map"]) # KeyError
self.assertIn("icd11.foundation", msdf2.prefix_map) # fail
Loading