Skip to content

Commit

Permalink
Major rework
Browse files Browse the repository at this point in the history
This is a major rework of many parts of this package. There is no point
in listing all the changes, this is completely backwards-incompatible.
  • Loading branch information
matrss committed Feb 21, 2024
1 parent 6b1c4c2 commit 4431df3
Show file tree
Hide file tree
Showing 24 changed files with 530 additions and 526 deletions.
12 changes: 0 additions & 12 deletions .flake8

This file was deleted.

1 change: 0 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
datalad_cds/_version.py export-subst
src/datalad_cds/_version.py export-subst
3 changes: 2 additions & 1 deletion .github/workflows/check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ name: check

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
tox:
Expand All @@ -16,7 +18,6 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Set up system
shell: bash
run: |
sudo apt-get update -qq
sudo apt-get install git-annex
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/docbuild.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
name: docs

on: [push, pull_request]
on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
build:
Expand Down
43 changes: 43 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Release

on:
push:
tags:
- 'v*'

jobs:
build:
name: Build python package distributions
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: "3.x"
- name: Install pypa/build
run: pip install build
- name: Build a binary wheel and a source tarball
run: python3 -m build
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
name: python-package-distributions
path: dist/

publish-to-pypi:
name: Publish release to PyPI
runs-on: ubuntu-latest
needs: build
environment:
name: pypi
url: https://pypi.org/p/datalad-cds
permissions:
id-token: write
steps:
- name: Download all the dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
55 changes: 0 additions & 55 deletions .github/workflows/test_crippledfs.yml

This file was deleted.

14 changes: 7 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.pybuild/
.coverage
/.tox
*.egg-info
*.py[coe]
.#*
.*.swp
pip-wheel-metadata
docs/build
docs/source/generated
.coverage
.hypothesis
.pybuild/
/.tox
build/
dist/
*.grib
docs/build
docs/source/generated
pip-wheel-metadata
209 changes: 70 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,156 +1,87 @@
# DataLad CDS Extension
# DataLad extension for the Copernicus Climate Data Store

## Table of contents

- Recommended knowledge
- Set up
- Usage
- Request know-how
- Options
## What?

A DataLad extension to integrate with the Copernicus Climate Data Store (CDS).
So far this just implements a `datalad download-cds` command that can be used to fetch data from the CDS
and record this action in a way so that `datalad get` (or just `git annex get`) can redo the download in the future.


## Recommended Knowledge:
## Why?

DataLad https://www.datalad.org/
This extension enables automated provenance tracking for fetching data from the CDS.
In a dataset that retrieves data from the CDS using this extension it will become visible how this data was initially fetched
and how it can be retrieved again in the future.

## Set up
Before installing this extension, please install datalad!

https://handbook.datalad.org/en/latest/intro/installation.html
## How?

Clone this repository and run
You will first have to create an account with the CDS,
if you don't have one already.
You can do so here: <https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome>

pip install -e .
Next,
you will need to create the "~/.cdsapirc" file as described here: <https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key>.
This file is required since the datalad-cds extension internally uses the cdsapi package
and therefore uses its authentication mechanism.

Make sure you have valid credentials for the cds api!
If you're not registered yet, here is the manual:
https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome \
Create a DataLad dataset:

datalad create -c text2git DataLad-101
Change to the dataset:

cd Datalad-101

Now you can execute the datalad-download-cds command!

Datalad handbook:
http://handbook.datalad.org/en/latest/

Datalad documentation:
https://docs.datalad.org/en/stable/index.html

## Usage
Extension for the automatic download from the CDS DataStore.
Works like `datalad download-url`


In general a command looks like this:

datalad download-cds [-h] [-d PATH] [-O PATH] [--archive] [--nosave] [-m MESSAGE]
[--version] filenames

Example:

datalad download-cds test.txt -m "This is the commit message"


In this case test.txt contains a cds request.

'derived-reanalysis-energy-moisture-budget',
{
'format': 'zip',
'variable': 'divergence_of_vertical_integral_of_latent_heat_flux',
'year': '1979',
'month': '01',
'area': [
90, 0, -90,
360,
],
},
'download.zip'

You can generate yourself the request here:
https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset

Example for a request generated by the CDS data store:

import cdsapi

c = cdsapi.Client()

c.retrieve(
'derived-reanalysis-energy-moisture-budget',
{
'format': 'zip',
'variable': 'divergence_of_vertical_integral_of_latent_heat_flux',
'year': '1979',
'month': '01',
'area': [
90, 0, -90,
360,
],
},
'download.zip')

### You only need the request in between the brackets of the retrieve method!

## Request Know-How

A request always consists of:

A dataset:

`'derived-reanalysis-energy-moisture-budget'`

request-parameters (in form of a dictionary):
Also,
you need to install datalad and the datalad-cds extension.
Both can be had through pip.

Now you are ready to use the extension.
When you look through the CDS you will notice that for any given dataset you can select a subset of the data using the "Download data" tab.
After you do that you can use the "Show API request" button at the bottom to get a short python script that would fetch the chosen subset using the cdsapi.
The following is an example of that:
```python
#!/usr/bin/env python
import cdsapi
c = cdsapi.Client()
c.retrieve(
"reanalysis-era5-pressure-levels",
{
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"year": "2008",
"month": "01",
"day": "01",
"time": "12:00",
"format": "grib"
},
"download.grib",
)
```

To fetch the same data to the same local file using datalad-cds we just need to adapt this a little:
```bash
$ datalad download-cds --path download.grib '
{
'format': 'zip',
'variable': 'divergence_of_vertical_integral_of_latent_heat_flux',
'year': '1979',
'month': '01',
'area': [
90, 0, -90,
360,
],
"dataset": "reanalysis-era5-pressure-levels",
"sub-selection": {
"variable": "temperature",
"pressure_level": "1000",
"product_type": "reanalysis",
"year": "2008",
"month": "01",
"day": "01",
"time": "12:00",
"format": "grib"
}
}
'
```

A filename where the request will get written into:

`'download.zip'`

The first two parameters are mandatory! If you do not specify the file where it gets written into in the file of the general request, you have to do it in the command.

Example:

datalad download-cds test.txt --path test2.zip

If you specify both, the path in the command will be used!

## Options

### filename
This is the file, in which the cds request is stored

### -h, --help
Shows the help message, --help shows the man page

### -d PATH, --dataset PATH
Defines the dataset, not necessary to define

### --path PATH, -O PATH
If specified, overrides the PATH of where the file gets written to. If not specified, it has to be present in the cds-request-file

### --archive
pass the downloaded files to datalad add-archive-content –delete.

### --nosave
by default all modifications to a dataset are immediately saved. Giving this option will disable this behavior.
The local path to save to ("download.grib") becomes the `--path` argument.
The dataset name ("reanalysis-era5-pressure-levels" in this case) becomes the value of the `dataset` key in a json object that describes the data to be downloaded.
The sub-selection of the dataset becomes the value of the `sub-selection` key.

### -m MESSAGE, --message MESSAGE
Message to be added to the git log
After executing the above `datalad download-cds` command in a DataLad dataset a file called "download.grib" should be newly created.
This file will have its origin tracked in git-annex (you can see that by running `git annex whereis download.grib`).
If you now `datalad drop` the file
and then `datalad get` it you'll see that git-annex will automatically re-retrieve the file from the CDS
as if it was just another location to get data from.

### --version
show the module and its version
To see more possible usage options take a look at the man page of the command (`datalad download-cds --help`)
or the documentation at <https://matrss.github.io/datalad-cds/>.
Loading

0 comments on commit 4431df3

Please sign in to comment.