forked from lvanhaaren/DataladExtensionCDS
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is a major rework of many parts of this package. There is no point in listing all the changes, this is completely backwards-incompatible.
- Loading branch information
Showing
24 changed files
with
530 additions
and
526 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1 @@ | ||
datalad_cds/_version.py export-subst | ||
src/datalad_cds/_version.py export-subst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,10 @@ | ||
name: docs | ||
|
||
on: [push, pull_request] | ||
on: | ||
push: | ||
branches: [main] | ||
pull_request: | ||
branches: [main] | ||
|
||
jobs: | ||
build: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
name: Release | ||
|
||
on: | ||
push: | ||
tags: | ||
- 'v*' | ||
|
||
jobs: | ||
build: | ||
name: Build python package distributions | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.x" | ||
- name: Install pypa/build | ||
run: pip install build | ||
- name: Build a binary wheel and a source tarball | ||
run: python3 -m build | ||
- name: Store the distribution packages | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: python-package-distributions | ||
path: dist/ | ||
|
||
publish-to-pypi: | ||
name: Publish release to PyPI | ||
runs-on: ubuntu-latest | ||
needs: build | ||
environment: | ||
name: pypi | ||
url: https://pypi.org/p/datalad-cds | ||
permissions: | ||
id-token: write | ||
steps: | ||
- name: Download all the dists | ||
uses: actions/download-artifact@v4 | ||
with: | ||
name: python-package-distributions | ||
path: dist/ | ||
- name: Publish package distributions to PyPI | ||
uses: pypa/gh-action-pypi-publish@release/v1 |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
.pybuild/ | ||
.coverage | ||
/.tox | ||
*.egg-info | ||
*.py[coe] | ||
.#* | ||
.*.swp | ||
pip-wheel-metadata | ||
docs/build | ||
docs/source/generated | ||
.coverage | ||
.hypothesis | ||
.pybuild/ | ||
/.tox | ||
build/ | ||
dist/ | ||
*.grib | ||
docs/build | ||
docs/source/generated | ||
pip-wheel-metadata |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,156 +1,87 @@ | ||
# DataLad CDS Extension | ||
# DataLad extension for the Copernicus Climate Data Store | ||
|
||
## Table of contents | ||
|
||
- Recommended knowledge | ||
- Set up | ||
- Usage | ||
- Request know-how | ||
- Options | ||
## What? | ||
|
||
A DataLad extension to integrate with the Copernicus Climate Data Store (CDS). | ||
So far this just implements a `datalad download-cds` command that can be used to fetch data from the CDS | ||
and record this action in a way so that `datalad get` (or just `git annex get`) can redo the download in the future. | ||
|
||
|
||
## Recommended Knowledge: | ||
## Why? | ||
|
||
DataLad https://www.datalad.org/ | ||
This extension enables automated provenance tracking for fetching data from the CDS. | ||
In a dataset that retrieves data from the CDS using this extension it will become visible how this data was initially fetched | ||
and how it can be retrieved again in the future. | ||
|
||
## Set up | ||
Before installing this extension, please install datalad! | ||
|
||
https://handbook.datalad.org/en/latest/intro/installation.html | ||
## How? | ||
|
||
Clone this repository and run | ||
You will first have to create an account with the CDS, | ||
if you don't have one already. | ||
You can do so here: <https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome> | ||
|
||
pip install -e . | ||
Next, | ||
you will need to create the "~/.cdsapirc" file as described here: <https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key>. | ||
This file is required since the datalad-cds extension internally uses the cdsapi package | ||
and therefore uses its authentication mechanism. | ||
|
||
Make sure you have valid credentials for the cds api! | ||
If you're not registered yet, here is the manual: | ||
https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome \ | ||
Create a DataLad dataset: | ||
|
||
datalad create -c text2git DataLad-101 | ||
Change to the dataset: | ||
|
||
cd Datalad-101 | ||
|
||
Now you can execute the datalad-download-cds command! | ||
|
||
Datalad handbook: | ||
http://handbook.datalad.org/en/latest/ | ||
|
||
Datalad documentation: | ||
https://docs.datalad.org/en/stable/index.html | ||
|
||
## Usage | ||
Extension for the automatic download from the CDS DataStore. | ||
Works like `datalad download-url` | ||
|
||
|
||
In general a command looks like this: | ||
|
||
datalad download-cds [-h] [-d PATH] [-O PATH] [--archive] [--nosave] [-m MESSAGE] | ||
[--version] filenames | ||
|
||
Example: | ||
|
||
datalad download-cds test.txt -m "This is the commit message" | ||
|
||
|
||
In this case test.txt contains a cds request. | ||
|
||
'derived-reanalysis-energy-moisture-budget', | ||
{ | ||
'format': 'zip', | ||
'variable': 'divergence_of_vertical_integral_of_latent_heat_flux', | ||
'year': '1979', | ||
'month': '01', | ||
'area': [ | ||
90, 0, -90, | ||
360, | ||
], | ||
}, | ||
'download.zip' | ||
|
||
You can generate yourself the request here: | ||
https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset | ||
|
||
Example for a request generated by the CDS data store: | ||
|
||
import cdsapi | ||
|
||
c = cdsapi.Client() | ||
|
||
c.retrieve( | ||
'derived-reanalysis-energy-moisture-budget', | ||
{ | ||
'format': 'zip', | ||
'variable': 'divergence_of_vertical_integral_of_latent_heat_flux', | ||
'year': '1979', | ||
'month': '01', | ||
'area': [ | ||
90, 0, -90, | ||
360, | ||
], | ||
}, | ||
'download.zip') | ||
|
||
### You only need the request in between the brackets of the retrieve method! | ||
|
||
## Request Know-How | ||
|
||
A request always consists of: | ||
|
||
A dataset: | ||
|
||
`'derived-reanalysis-energy-moisture-budget'` | ||
|
||
request-parameters (in form of a dictionary): | ||
Also, | ||
you need to install datalad and the datalad-cds extension. | ||
Both can be had through pip. | ||
|
||
Now you are ready to use the extension. | ||
When you look through the CDS you will notice that for any given dataset you can select a subset of the data using the "Download data" tab. | ||
After you do that you can use the "Show API request" button at the bottom to get a short python script that would fetch the chosen subset using the cdsapi. | ||
The following is an example of that: | ||
```python | ||
#!/usr/bin/env python | ||
import cdsapi | ||
c = cdsapi.Client() | ||
c.retrieve( | ||
"reanalysis-era5-pressure-levels", | ||
{ | ||
"variable": "temperature", | ||
"pressure_level": "1000", | ||
"product_type": "reanalysis", | ||
"year": "2008", | ||
"month": "01", | ||
"day": "01", | ||
"time": "12:00", | ||
"format": "grib" | ||
}, | ||
"download.grib", | ||
) | ||
``` | ||
|
||
To fetch the same data to the same local file using datalad-cds we just need to adapt this a little: | ||
```bash | ||
$ datalad download-cds --path download.grib ' | ||
{ | ||
'format': 'zip', | ||
'variable': 'divergence_of_vertical_integral_of_latent_heat_flux', | ||
'year': '1979', | ||
'month': '01', | ||
'area': [ | ||
90, 0, -90, | ||
360, | ||
], | ||
"dataset": "reanalysis-era5-pressure-levels", | ||
"sub-selection": { | ||
"variable": "temperature", | ||
"pressure_level": "1000", | ||
"product_type": "reanalysis", | ||
"year": "2008", | ||
"month": "01", | ||
"day": "01", | ||
"time": "12:00", | ||
"format": "grib" | ||
} | ||
} | ||
' | ||
``` | ||
|
||
A filename where the request will get written into: | ||
|
||
`'download.zip'` | ||
|
||
The first two parameters are mandatory! If you do not specify the file where it gets written into in the file of the general request, you have to do it in the command. | ||
|
||
Example: | ||
|
||
datalad download-cds test.txt --path test2.zip | ||
|
||
If you specify both, the path in the command will be used! | ||
|
||
## Options | ||
|
||
### filename | ||
This is the file, in which the cds request is stored | ||
|
||
### -h, --help | ||
Shows the help message, --help shows the man page | ||
|
||
### -d PATH, --dataset PATH | ||
Defines the dataset, not necessary to define | ||
|
||
### --path PATH, -O PATH | ||
If specified, overrides the PATH of where the file gets written to. If not specified, it has to be present in the cds-request-file | ||
|
||
### --archive | ||
pass the downloaded files to datalad add-archive-content –delete. | ||
|
||
### --nosave | ||
by default all modifications to a dataset are immediately saved. Giving this option will disable this behavior. | ||
The local path to save to ("download.grib") becomes the `--path` argument. | ||
The dataset name ("reanalysis-era5-pressure-levels" in this case) becomes the value of the `dataset` key in a json object that describes the data to be downloaded. | ||
The sub-selection of the dataset becomes the value of the `sub-selection` key. | ||
|
||
### -m MESSAGE, --message MESSAGE | ||
Message to be added to the git log | ||
After executing the above `datalad download-cds` command in a DataLad dataset a file called "download.grib" should be newly created. | ||
This file will have its origin tracked in git-annex (you can see that by running `git annex whereis download.grib`). | ||
If you now `datalad drop` the file | ||
and then `datalad get` it you'll see that git-annex will automatically re-retrieve the file from the CDS | ||
as if it was just another location to get data from. | ||
|
||
### --version | ||
show the module and its version | ||
To see more possible usage options take a look at the man page of the command (`datalad download-cds --help`) | ||
or the documentation at <https://matrss.github.io/datalad-cds/>. |
Oops, something went wrong.