Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing 'source retrieval provenance' in merged edges #369

Open
mbrush opened this issue Aug 25, 2022 · 1 comment
Open

Representing 'source retrieval provenance' in merged edges #369

mbrush opened this issue Aug 25, 2022 · 1 comment
Milestone

Comments

@mbrush
Copy link
Collaborator

mbrush commented Aug 25, 2022

A dedicated model to represent 'source retrieval provenance' been proposed/discussed in several recent meetings - to better support emerging use cases around edge merging and answer debugging. The key requirement for the edge merging use case is to represent an ordered tree of retrievals that result from edge merging operations, where it is clear which source was primary/original, and which were aggregators. Several approaches have been proposed and are discussed in the document here.

The general consensus from recent calls is summarized below:

  1. There is interest in exploring a dedicated structure in TRAPI model for retrieval provenance (as opposed to using nested Attributes). Minimally this would require a new type of object to hold retrieval provenance metadata, and a new edge property to point at it.
  2. We should start simple and focus on core requirements for edge merging. Avoid nested objects to the extent possible, and do not worry about provenance metadata concerning each retrieval operation at this point (when, who, how, access url, etc). But we may want a model that can be easily expanded to support this in the future (this is a key question that will play into choice of proposals).

These priorities focused us on two candidate approaches:

Data Examples illustrate how these two approaches would represent two retrieval scenarios (see diagrams below, and further described in the Google document:

image

image


Finally, note that this is related to broader question of retaining EPC in merged edges, as discussed in #313.

@vdancik vdancik added this to the v1.4 milestone Sep 2, 2022
@mbrush
Copy link
Collaborator Author

mbrush commented Nov 18, 2022

Adding a slight twist on Candidate A (lets call it candidate A.1) that lets us use Attribute objects but offers some degree of structural separation of retrieval provenance Attributes (which is one draw of Candidate B), from Attribute objects holding other types of edge metadata. It requires only the creation of a dedicated Edge property separate from attributes that will hold Attribute objects used to describe source retrieval provenance (we might call this property retrieval_provenance_attributes, or just retrieval_attributes).

This would begin to address one of the concerns raised about Candidate A - which is that it is hard to find/assemble Attribute objects describing retrieval provenance amongst that potentially tens of other attribute objects hanging from a given Edge.


  "edges": {
    "id": "e719491"
    "subject": "RXCUI:1544384",
    "predicate": "biolink:correlated_with",
    "object": "MONDO:0008383",
    "attributes": [ ]
    "retrieval_attributes": [  ]

mbrush added a commit to mbrush/ReasonerAPI that referenced this issue Jan 19, 2023
Adding RetrievalSource object (and a reference to it form Edge) to support richer representations of retrieval provenance per NCATSTranslator#369.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants