Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing separate methods for JSON and JSONLD #494

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matentzn
Copy link
Collaborator

@matentzn matentzn commented Feb 4, 2024

This PR adds methods

  • parse_sssom_jsonld
  • from_sssom_jsonld
  • write_jsonld
  • to_jsonld
  • test_parse_sssom_jsonld
  • test_write_sssom_jsonld

Which are exactly analogous to what was there before for JSON.

But its actual purpose is not so much to add those methods, but to carefully review the format (to make sure we are happy) so we can start making headway on mapping-commons/sssom#321.

Breaking changes

  • json parameter now refers to json, but used to refer to jsonld. So anyone expecting jsonld will now be served with json.

JSON Format

We need to make sure that the JSON format looks exactly as we envision it. Problems I see so far

Here is an example JSON file
{
  "mapping_set_id": "https://w3id.org/sssom/mapping/tests/data/basic.tsv",
  "license": "https://creativecommons.org/publicdomain/zero/1.0/",
  "mappings": [
    {
      "subject_id": "a:something",
      "predicate_id": "rdfs:subClassOf",
      "object_id": "b:something",
      "mapping_justification": "semapv:LexicalMatching",
      "subject_label": "XXXXX",
      "subject_category": "biolink:AnatomicalEntity",
      "object_label": "xxxxxx",
      "object_category": "biolink:AnatomicalEntity",
      "subject_source": "a:example",
      "object_source": "b:example",
      "mapping_tool": "rdf_matcher",
      "confidence": 0.8,
      "subject_match_field": [
        "rdfs:label"
      ],
      "object_match_field": [
        "rdfs:label"
      ],
      "match_string": [
        "xxxxx"
      ],
      "comment": "mock data"
    },
    {
      "subject_id": "a:something",
      "predicate_id": "owl:equivalentClass",
      "object_id": "c:something",
      "mapping_justification": "semapv:LexicalMatching",
      "subject_label": "XYXYX",
      "subject_category": "biolink:AnatomicalEntity",
      "object_label": "xyxyxy",
      "object_category": "biolink:AnatomicalEntity",
      "subject_source": "a:example",
      "object_source": "c:example",
      "mapping_tool": "rdf_matcher",
      "confidence": 0.83,
      "subject_match_field": [
        "rdfs:label"
      ],
      "object_match_field": [
        "rdfs:label"
      ],
      "match_string": [
        "xxxxx"
      ],
      "comment": "mock data"
    }
  ],
  "creator_id": [
    "orcid:1234",
    "orcid:5678"
  ],
  "mapping_tool": "https://github.com/cmungall/rdf_matcher",
  "mapping_date": "2020-05-30"
}

The two remaining errors are also exactly due to this problem:

FAILED tests/test_conversion.py::SSSOMReadWriteTestSuite::test_conversion - AssertionError: 6 != 8 : JSON document has less elements than the orginal one for basic.tsv. Json: {"mapping_set_id": "https:...
FAILED tests/test_parsers.py::TestParseExplicit::test_round_trip_json - ValueError: {'UMLS', 'orcid', 'DOID'} are used in the SSSOM mapping set but it does not exist in the prefix map

Adds methods

- parse_sssom_jsonld
- from_sssom_jsonld
- write_jsonld
- to_jsonld
- test_parse_sssom_jsonld
- test_write_sssom_jsonld

Which are exactly analogous to what was there before for JSON.
self.mapping_count,
f"{path} has the wrong number of mappings.",
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little @cthoyt in Nicos head:

AGAIN??? Please add short explicit tests so I can understand what is going, in particular the difference between JSON and JSONLD serialisations.

Copy link
Collaborator Author

@matentzn matentzn Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn: I will add tests after we have had some discussions on the nature of the JSON output.

@gouttegd
Copy link
Contributor

gouttegd commented Feb 5, 2024

We will probably have to mapping-commons/sssom#225

The problem we might run into with that is that, as far as I know (and as I have noted in the discussion about the extension slots), LinkML does not have a map type. We’d want to declare a field that could be used like this:

"curie_map": {
  "FBbt": "http://purl.obolibrary.org/obo/FBbt_"
}

but unless I missed something in LinkML’s docs, this is not possible. All we can do is to have a list (i.e. a “multi-valued” field) of custom “dictionary entry“ types, like this:

"curie_map": [
    { "key": "Fbbt",
      "value": "http://purl.obolibrary.org/obo/FBbt_" }
  ]

which of course would work but would be… weird, at the very least.

My own solution (that nobody will like, I know) to that is simple: decide that CURIEfied identifiers are only for the TSV format (which is what the spec currently says, incidentally), JSON should only contain full-length identifiers. No CURIE map needed, problem solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants