Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add properties to the RetrievalSource object #392

Closed
mbrush opened this issue Jan 26, 2023 · 5 comments
Closed

Add properties to the RetrievalSource object #392

mbrush opened this issue Jan 26, 2023 · 5 comments
Milestone

Comments

@mbrush
Copy link
Collaborator

mbrush commented Jan 26, 2023

Per outcome of voting in #386, we settled on Proposal 2 (aka Candidate B in #369) which moves away form an Attribute-based representation - and instead defines a dedicated Edge property and RetrievalSource object to describe retrieval provenance information.

But the proposed RetrievalSource object is missing fields needed to capture information currently being provided about knowledge sources in Attribute objects (e.g. a free text description, a url for the resoruce's homepage, etc). We may want to add a resource_description field and a resource_url field to the RetrievalSource object to address this (However these may ultimately not be needed if we decide that descriptions and urls directly form the infores registry are always adequate).

In addition, to address the request from the UI team to capture a linkout to a source record url when possible (see EPC #6), I propose to add a source_record_url field that can be used to hold the url of a specific record/web page that contains the knowledge expressed in the Edge. Capturing this here would mean that all information about how and from where the knowledge expressed in an edge was retrieved gets represented together in the RetrievalSource object.

RetrievalSource:
  - resource: curie
  - resource_role:  enum 
  - upstream_resource: curie
 # New proposed fields
  - resource_description: string
  - resource_url: string
  - source_record_url: string
@mbrush
Copy link
Collaborator Author

mbrush commented Jan 26, 2023

@vdancik has pointed out that these url linkouts don't always represent 'records' of the knowledge in the Edge specifically - and that a different name for the proposed source_record_url field may be warranted. e.g. the Chembl page that we could reference from an Edge reporting that "Carbetocin is an agonist of the Oxytocin Receptor" also contains a lot of other knowledge about Carbetocin besides the proteins it targets.

  • @mbrush does not see this as a big problem - as the webpage in the case above does represent a 'record' - specifically, a record for the chemical Carbetocin. There is nothing that says the 'record' referenced from an Edge must be a record only of the knowledge the edge contains - just that it is a record that contains the information somewhere within it. So IMO I think we don't need to put so fine a point on things here, and can move ahead with the proposal. But am open to other ideas for naming this property.

@vdancik also suggested we consider capturing source record urls outside of the RetrievalSource object - using an Attribute keyed on an edge property like source_record_urls - which would take a list of all urls the data provider wants to capture that report the specific knowledge expressed in the edge. This would be analogous to Proposal 3 in the earlier ticket here about source urls.

  • @mbrush would prefer to keep this information in the Source object along with info about the Information resource, as IMO these are logically and intuitively connected, and should be represented in the same place as proposed above. I think this is the cleanest way to be sure we link these records to the Resource that holds them. Unless we think it provides significant benefit to have one field in which tools can look to find record url linkouts or an edge.

@vdancik vdancik added this to the v1.4 milestone Feb 1, 2023
@mbrush
Copy link
Collaborator Author

mbrush commented Mar 1, 2023

Below I show how retrieval provenance for a Chemical - affects - Gene edge from Chembl might look if we implemented the approach proposed here:


 {
  "edges": {
    "id": "e719491"
    "subject": "chembl:1098",
    "predicate": "biolink:affects",
    "object": "hgnc:10591",
    "sources": [
      {
      "type": biolink:RetrievalSource,
      "resource": "infores:chembl",
      "resoruce_url":  "https://www.ebi.ac.uk/chembl/",
      "resource_role": "primary knowledge source",
      "source_record_url": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/"
      },
      {                                        
      "type": biolink:RetrievalSource,
      "resource": "infores:molepro",
      "resoruce_url":  "https://translator.broadinstitute.org/molepro/trapi/v1.0/ui/",
      "resource_role": "aggregator knowledge source",
      "previous_resource": ["infores:chembl"]      
      },
      {                                        
      "type": biolink:RetrievalSource,
      "resource": "infores:aragorn",  
      "resoruce_url":  "https://github.com/NCATSTranslator/Translator-All/wiki/ARAGORN",
      "resource_role": "aggregator knowledge source",
      "previous_resource": ["infores:molepro"]
      },
      {                                        
      "type": biolink:RetrievalSource,
      "resource": "infores:ars", 
      "resoruce_url":  "https://github.com/NCATSTranslator/Translator-All/wiki/Autonomous-Relay-System-(ARS)",
      "resource_role": "aggregator knowledge source", 
      "previous_resource": ["infores:aragorn"]
      },

    "attributes": [
      {                                        
        "attribute_type_id": "publications",                  
        "value": ["PMID:12761351",  PMID:12761351"]
        "value_type_id": "biolink:Publication",
        "attribute_source": ["infores:chembl"]   # links pubs to the resource in the list above that provided them
      }
     ]

@mbrush
Copy link
Collaborator Author

mbrush commented Mar 2, 2023

If we decide not to capture source record url linkouts within the RetrievalSource object, our Edge metadata might look like this:


 {
  "edges": {
    "id": "e719491"
    "subject": "chembl:1098",
    "predicate": "biolink:affects",
    "object": "hgnc:10591",
    "sources": [
      {
      "type": biolink:RetrievalSource,
      "resource": "infores:chembl",
      "resoruce_url":  "https://www.ebi.ac.uk/chembl/",
      "resource_role": "primary knowledge source",
      },
      {                                        
      "type": biolink:RetrievalSource,
      "resource": "infores:molepro",
      "resoruce_url":  "https://translator.broadinstitute.org/molepro/trapi/v1.0/ui/",
      "resource_role": "aggregator knowledge source",
      "previous_resource": ["infores:chembl"]      
      },
      {                                        
      "type": biolink:RetrievalSource,
      "resource": "infores:aragorn",  
      "resoruce_url":  "https://github.com/NCATSTranslator/Translator-All/wiki/ARAGORN",
      "resource_role": "aggregator knowledge source",
      "previous_resource": ["infores:molepro"]
      },
      {                                        
      "type": biolink:RetrievalSource,
      "resource": "infores:ars", 
      "resoruce_url":  "https://github.com/NCATSTranslator/Translator-All/wiki/Autonomous-Relay-System-(ARS)",
      "resource_role": "aggregator knowledge source", 
      "previous_resource": ["infores:aragorn"]
      },

    "attributes": [
      {                                        
        "attribute_type_id": "source_record_urls",                  
        "value": ["https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/"]
        "value_type_id": "biolink:urlorcurie",
        "attribute_source": ["infores:molepro"]  
      },
      {                                        
        "attribute_type_id": "publications",                  
        "value": ["PMID:1276135", "PMID:12761351"],
        "value_type_id": "biolink:Publication",
        "attribute_source": ["infores:chembl"]   # links pubs to the resource in the list above that provided them
      }
     ]

Note that if more than one of the Resources in the retrieval chain provide web pages for records displaying the knowledge in the Edge (e.g. the primary source and an an aggregator both have web pages for the record) - then both urls could be listed in the source_record_urls field. But there would be no way to explicitly indicate which url goes with the primary vs the aggregator Resource.

@mbrush
Copy link
Collaborator Author

mbrush commented Mar 8, 2023

Decisions made on the March 2 2023 TRAPI call:

  • no need to add resource_description or resoruce_url properties - as this information can be pulled as needed from the infores catalog.
  • add a source_record_urls property to the RetrievalSource object in theTRAPI schema - which can take one or more urls linking to a web page or document provided by the source, that contains the knowledge expressed in the Edge. If the knowledge is contained in more than one web page on an Information Resources site, separate urls MAY be provided for each.
source_record_urls:
  type: array
  items: string
  description:  >-
     A human-consumable URL linking to a specific web page or document provided by the source, that 
    contains a record of the knowledge expressed in the Edge.  If the knowledge is contained in more than
    one web page on an Information Resources site, separate urls MAY be provided for each (e.g. the fact
    that the KIT protein is a therapeutic target for Imatinib can be found on pages for the protein and the
    drug  within the Therapeutic Targets Database (TTD) website.         
  example: [https://db.idrblab.net/ttd/data/target/details/t57700, https://db.idrblab.net/ttd/data/drug/de
           tails/d0az3c]

  • add a record_urls association slot to the Biolink Model - which would be used as the key of a TRAPI attribute object that holds additional record URLs form resources that were not one of the actual retrieval sources for an Edge, but which the provider wants to reference to point end users toward additional records of the reported knowledge. See Add a record_url association slot biolink/biolink-model#1238.

mbrush added a commit that referenced this issue Mar 9, 2023
added RetrievalSource.source_record_urls property, per #392.
@edeutsch
Copy link
Collaborator

I think this has been addressed in revised PR #393
Please reopen if there are remaining concerns with a description thereof.

uhbrar pushed a commit to uhbrar/ReasonerAPI that referenced this issue Mar 27, 2023
added RetrievalSource.source_record_urls property, per NCATSTranslator#392.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants