Representing 'source retrieval provenance' in merged edges #369

mbrush · 2022-08-25T18:30:07Z

A dedicated model to represent 'source retrieval provenance' been proposed/discussed in several recent meetings - to better support emerging use cases around edge merging and answer debugging. The key requirement for the edge merging use case is to represent an ordered tree of retrievals that result from edge merging operations, where it is clear which source was primary/original, and which were aggregators. Several approaches have been proposed and are discussed in the document here.

The general consensus from recent calls is summarized below:

There is interest in exploring a dedicated structure in TRAPI model for retrieval provenance (as opposed to using nested Attributes). Minimally this would require a new type of object to hold retrieval provenance metadata, and a new edge property to point at it.
We should start simple and focus on core requirements for edge merging. Avoid nested objects to the extent possible, and do not worry about provenance metadata concerning each retrieval operation at this point (when, who, how, access url, etc). But we may want a model that can be easily expanded to support this in the future (this is a key question that will play into choice of proposals).

These priorities focused us on two candidate approaches:

Candidate A = use of Attributes. Overview, Data Examples
Candidate B = flattening into a dedicated 'Source' object. Overview, Data Examples

Data Examples illustrate how these two approaches would represent two retrieval scenarios (see diagrams below, and further described in the Google document:

Finally, note that this is related to broader question of retaining EPC in merged edges, as discussed in #313.

mbrush · 2022-11-18T22:40:58Z

Adding a slight twist on Candidate A (lets call it candidate A.1) that lets us use Attribute objects but offers some degree of structural separation of retrieval provenance Attributes (which is one draw of Candidate B), from Attribute objects holding other types of edge metadata. It requires only the creation of a dedicated Edge property separate from attributes that will hold Attribute objects used to describe source retrieval provenance (we might call this property retrieval_provenance_attributes, or just retrieval_attributes).

This would begin to address one of the concerns raised about Candidate A - which is that it is hard to find/assemble Attribute objects describing retrieval provenance amongst that potentially tens of other attribute objects hanging from a given Edge.

  "edges": {
    "id": "e719491"
    "subject": "RXCUI:1544384",
    "predicate": "biolink:correlated_with",
    "object": "MONDO:0008383",
    "attributes": [ ]
    "retrieval_attributes": [  ]

Adding RetrievalSource object (and a reference to it form Edge) to support richer representations of retrieval provenance per NCATSTranslator#369.

vdancik added this to the v1.4 milestone Sep 2, 2022

mbrush added a commit to mbrush/ReasonerAPI that referenced this issue Jan 19, 2023

Refactor of source retrieval provenance model

57df891

Adding RetrievalSource object (and a reference to it form Edge) to support richer representations of retrieval provenance per NCATSTranslator#369.

mbrush mentioned this issue Jan 19, 2023

Refactor of source retrieval provenance model #388

Closed

mbrush mentioned this issue Jan 26, 2023

Add properties to the RetrievalSource object #392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representing 'source retrieval provenance' in merged edges #369

Representing 'source retrieval provenance' in merged edges #369

mbrush commented Aug 25, 2022 •

edited

Loading

mbrush commented Nov 18, 2022 •

edited

Loading

Representing 'source retrieval provenance' in merged edges #369

Representing 'source retrieval provenance' in merged edges #369

Comments

mbrush commented Aug 25, 2022 • edited Loading

mbrush commented Nov 18, 2022 • edited Loading

mbrush commented Aug 25, 2022 •

edited

Loading

mbrush commented Nov 18, 2022 •

edited

Loading