Skip to content

A repository for the schemas used for the Data Repository Service.

License

Notifications You must be signed in to change notification settings

sbg/data-repository-service-schemas

 
 

Repository files navigation

Schemas for the Data Repository Service (DRS) API


Build Status Swagger Validator Read the Docs badge PyPI - Python Version

View the schemas in Swagger UI

The goal of DRS is to create a generic API on top of existing object storage systems so workflow systems can access data in a single, standard way regardless of where it's stored. It's maintained by the GA4GH Cloud Workstream.

Key features

The API is split into two sections:

  • Data Object management, which enables the creation, updating, deletion, versioning, and unique identification of files and data bundles (flat collections of files); and
  • Data Object querying, which can locate data objects across different cloud environments and DRS implementations.

Getting started

Installing is as easy as:

$ pip install ga4gh-dos-schemas

This will install both a demonstration server and a Python client that will allow you to manage Data Objects in a local server. You can start the demo server using ga4gh_dos_server. This starts a Data Repository Service at http://localhost:8080.

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz
md5sum chr22.fa.gz
# 41b47ce1cc21b558409c19b892e1c0d1  chr22.fa.gz
curl -X POST -H 'Content-Type: application/json' \
    --data '{"data_object":
              {"id": "hg38-chr22",
               "name": "Human Reference Chromosome 22",
               "checksums": [{"checksum": "41b47ce1cc21b558409c19b892e1c0d1", "type": "md5"}],
               "urls": [{"url": "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/chr22.fa.gz"}],
               "size": "12255678"}}' http://localhost:8080/ga4gh/dos/v1/dataobjects
# We can then get the newly created Data Object by id
curl http://localhost:8080/ga4gh/dos/v1/dataobjects/hg38-chr22
# Or by checksum!
curl -X GET http://localhost:8080/ga4gh/dos/v1/dataobjects -d checksum=41b47ce1cc21b558409c19b892e1c0d1

For more on getting started, check out the quickstart guide or the rest of the documentation at ReadtheDocs!

Getting involved!

The Data Repository Service Schemas are Apache 2 Licensed Open Source software. Please join us in the issues or check out the contributing docs!

About

A repository for the schemas used for the Data Repository Service.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 64.4%
  • Jupyter Notebook 35.2%
  • Makefile 0.4%