Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 7.88 KB

API_Overview.md

File metadata and controls

73 lines (54 loc) · 7.88 KB

API overview

Back to the main README

Overview

The federated data sharing Common API establishes an open standard for data platforms to participate in a open and closed data sharing networks. It speciies a set of endpoints required that provide a 'common' API to organisations wishing to participate in data sharing or federated analysis. Data sharing agreements are diverse and we need to remove barriers for data sharing amongst data controllers. This approach is intended to:

  • Clarity and transparency of the model in a complex ecosystem
  • Accelerate availability of data for research
  • Devolve the decision-making and governance to the appropriate level.
  • Encourage convergence of existing (proprietary or niche) efforts
  • Encourage an ecosystem of tools & syndication

By adopting the API, a data provider and their network can implement “connector” layer once, join multiple networks. Our approach asks data controllers to self-select at what ‘level‘ they can join the network, mainly dependent on what they are permitted to do with data in their custody:

Mode Metadata Selection & Filtering of record-level data Federated compute on record level data.
Level 0 Can be queried and retrieved Can be queried remotely and transferred to a client Federation not required, computation happens at client
Level 1 Can be queried and retrieved Can be queried remotely and transferred to a client Federation not required, computation happens at client
Level 2 Can be queried and retrieved Not permitted Containerised computations can be executed remotely with
selection query input, approved results returned

Open Standards

Rather than reinventing the wheel, the Common API adopts and adapts existing standards efforts

  • The API is defined Open API specifications.
  • API endpoints should be authenticated using OAuth2 (will be mandated in future versions)
  • Descriptive metadata is defined in a variant of the W3C DCAT standard and a simple data dictionary model.
  • Selections are defined in GraphQL as an abstraction over querying, selection and filtering
  • Federated computations are defined in a variant of the GA4GH Task Execution Service (TES) API

Note: Field-level metadata (data dictionaries) are defined in a simple, pragmatic data model - existing partners aim to define or adopt a more robust community standard.

API modularity

There are three sections to the API:

Section Repository Level 0 Level 1 Level 2
Metadata common-api-metadata Yes Yes Yes
Selection common-api-selection N/A Yes Yes **
Federated compute common-api-tasks N/A N/A Yes

** Level 2 sites must implement the selection API "behind the scenes" to provide compute tasks with the selection required.

For maximum flexibility each section of the Common API is defined in separate git submodules and repositories. In this way, sites can implement combinations as required or desirable in their particular setting.The table below illustrates how different sections of the API could be opened up to support levels of sharing between a hub and a client (such as a user in a trusted Workspace).

Endpoints

Details of each endpoint:

Endpoint HTTP Payload Result Summary
/datasets GET N/A JSON Get a list of available datasets. Shows the list of all datasets available for querying.
/datasets/{datasetid} GET N/A JSON Get Catalogue entry (metadata) and Dictionaries (field descriptions) for dataset. Returns the catalogue metadata and a list of field descriptions for a specified dataset (by dataset ID).
/datasets/{datasetid}/catalogue GET N/A DCAT JSON Get Catalogue entry (metadata) for dataset. Returns the catalogue metadata for a specified dataset (by dataset ID).
/datasets/{datasetid}/dictionaries GET N/A Dictionary JSON Get Dictionaries (field descriptions) for dataset. Returns a list of field descriptions for each table within a specified dataset (by dataset ID).
/datasets/{datasetid}/dictionaries/{tableid} GET N/A Dictionary JSON Get a single dataset Dictionary for a specified table. Returns a set field descriptions for the specified table (by table ID) within a specified dataset (by dataset ID).
/selection/validate POST GraphQL JSON Validate a given selection query. With a simple GraphQL query, check whether the query is valid and corresponds to real fields at this location.
/selection/beacon POST GraphQL JSON Get a Beacon (T/F) for a specified data selection. With a simple Graph QL query, check which locations contain data relevant to a specific query.
/selection/select POST GraphQL JSON - data selected Perform a selection operation on a dataset. With a simple Graph QL query, returns the full selection of data in a JSON or .csv format.
/selection/preview POST GraphQL JSON - data preview Preview the results of a selection operation on a dataset. With a simple Graph QL query, returns a small sample of the selection in a JSON or .csv format.
/selection/profile POST GraphQL JSON - summary Get a profile of a selection operation on a dataset. Returns a set of metrics for the given selection operation.
/tasks/service-info GET N/A JSON Get service information about the service,such as storage details, resource availability, and other documentation
/tasks GET N/A JSON Get a list of of tasks for the current user
/tasks POST Task spec JSON - with task ID Create a new task using a task specification (links a selection query and containerised computation task)
/tasks/validate POST Task spec JSON Validate a task specification
/tasks/{task_id} GET N/A JSON - task details Get task details including status. If available, includes a link to the output of the task
/tasks/{task_id}/cancel POST N/A JSON - task status Cancel a task
/health_check GET N/A JSON Get a health check of the service.

Note that the following endpoints are experimental at version 1.1: /selection/beacon, /selection/preview and /selection/profile - they are expected to be firmed up in later versions.