-
Notifications
You must be signed in to change notification settings - Fork 1
Notes on CDS datasets
The CDS API is exposed as a REST API over HTTP. The REST API itself is not officially defined or documented. According to the Copernicus API How-To, the REST API should not be used directly, but rather through the cdsapi Python client library, which is available through pip and conda-forge.
The API exposed by the cdsapi Python library is also undocumented; the officially recommended way to use it is to build a request interactively via the web interface for the dataset of interest, then clicking the "Show API request" button. The details of the available parameters and valid parameter combinations are thus not explicitly documented, and can only be determined by manual exploration of the web interface.
A total of 66 climate datasets are listed at https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset. :
Identifier | Description |
---|---|
cems-fire-historical | Fire danger indices historical data from the Copernicus Emergency Management |
cems-glofas-forecast | River discharge and related forecasted data by the Global Flood Awareness |
cems-glofas-historical | River discharge and related historical data from the Global Flood Awareness |
derived-near-surface-meteorological-variables | Near surface meteorological variables from 1979 to 2018 derived from |
derived-utci-historical | Thermal comfort indices derived from ERA5 reanalysis |
ecv-for-climate-change | Essential climate variables for assessment of climate variability from |
efas-forecast | River discharge and related forecasted data by the European Flood Awareness |
efas-historical | River discharge and related historical data from the European Flood Awareness |
insitu-glaciers-elevation-mass | Glaciers elevation and mass change data from 1850 to present from the |
insitu-glaciers-extent | Glaciers distribution data from the Randolph Glacier Inventory for year |
insitu-gridded-observations-europe | E-OBS daily gridded meteorological data for Europe from 1950 to present |
projections-cmip5-daily-pressure-levels | CMIP5 daily data on pressure levels |
projections-cmip5-daily-single-levels | CMIP5 daily data on single levels |
projections-cmip5-monthly-pressure-levels | CMIP5 monthly data on pressure levels |
projections-cmip5-monthly-single-levels | CMIP5 monthly data on single levels |
projections-cordex-single-levels | CORDEX regional climate model data on single levels for Europe |
reanalysis-era5-land | ERA5-Land hourly data from 1981 to present |
reanalysis-era5-land-monthly-means | ERA5-Land monthly averaged data from 1981 to present |
reanalysis-era5-pressure-levels | ERA5 hourly data on pressure levels from 1979 to present |
reanalysis-era5-pressure-levels-monthly-means | ERA5 monthly averaged data on pressure levels from 1979 to present |
reanalysis-era5-single-levels | ERA5 hourly data on single levels from 1979 to present |
reanalysis-era5-single-levels-monthly-means | ERA5 monthly averaged data on single levels from 1979 to present |
reanalysis-uerra-europe-complete | Complete UERRA regional reanalysis for Europe from 1961 to 2019 |
reanalysis-uerra-europe-height-levels | UERRA regional reanalysis for Europe on height levels from 1961 to 2019 |
reanalysis-uerra-europe-pressure-levels | UERRA regional reanalysis for Europe on pressure levels from 1961 to |
reanalysis-uerra-europe-single-levels | UERRA regional reanalysis for Europe on single levels from 1961 to 2019 |
reanalysis-uerra-europe-soil-levels | UERRA regional reanalysis for Europe on soil levels from 1961 to 2019 |
satellite-aerosol-properties | Aerosol properties gridded data from 1995 to present derived from satellite |
satellite-albedo | Surface albedo 10-daily gridded data from 1981 to present |
satellite-carbon-dioxide | Carbon dioxide data from 2002 to present derived from satellite observations |
satellite-fire-burned-area | Fire burned area from 2001 to present derived from satellite observations |
satellite-lai-fapar | Leaf area index and fraction absorbed of photosynthetically active radiation 10-daily gridded data from 1981 to present |
satellite-land-cover | Land cover classification gridded maps from 1992 to present derived from |
satellite-methane | Methane data from 2002 to present derived from satellite observations |
satellite-ocean-colour | Ocean colour daily data from 1997 to present derived from satellite observations |
satellite-ozone | Ozone monthly gridded data from 1970 to present |
satellite-sea-ice | Sea ice monthly and daily gridded data from 1978 to present derived from |
satellite-sea-level-black-sea | Sea level daily gridded data from satellite altimetry for the Black Sea |
satellite-sea-level-global | Sea level daily gridded data from satellite altimetry for the global |
satellite-sea-level-mediterranean | Sea level daily gridded data from satellite altimetry for the Mediterranean Sea from 1993 to present |
satellite-sea-surface-temperature-ensemble-product | Sea surface temperature daily gridded data from 1981 to 2016 derived |
satellite-sea-surface-temperature | Sea surface temperature daily data from 1981 to present derived from |
satellite-soil-moisture | Soil moisture gridded data from 1978 to present |
seasonal-monthly-pressure-levels | Seasonal forecast monthly statistics on pressure levels |
seasonal-monthly-single-levels | Seasonal forecast monthly statistics on single levels |
seasonal-original-pressure-levels | Seasonal forecast daily data on pressure levels |
seasonal-original-single-levels | Seasonal forecast daily data on single levels |
seasonal-postprocessed-pressure-levels | Seasonal forecast anomalies on pressure levels |
seasonal-postprocessed-single-levels | Seasonal forecast anomalies on single levels |
sis-agroclimatic-indicators | Agroclimatic indicators from 1951 to 2099 derived from climate projections |
sis-agrometeorological-indicators | Agrometeorological indicators from 1979 to 2018 derived from reanalysis |
sis-ecv-cmip5-bias-corrected | Essential climate variables for water sector applications derived from CMIP5 projections |
sis-european-energy-sector | Climate data for the European energy sector from 1979 to 2016 derived from ERA-Interim |
sis-fisheries-ocean-fronts | Ocean fronts data for the Northwest European Shelf and Mediterranean |
sis-heat-and-cold-spells | Heat waves and cold spells in Europe derived from climate projections |
sis-ocean-wave-indicators | Ocean surface wave indicators for the European coast from 1977 to 2100 |
sis-ocean-wave-timeseries | Ocean surface wave time series for the European coast from 1976 to 2100 derived from climate projections |
sis-offshore-windfarm-indicators | Performance indicators for offshore wind farms in Europe from 1977 to |
sis-shipping-arctic | Arctic route availability and cost projection derived from climate projections |
sis-shipping-consumption-on-routes | Ship performance along standard shipping routes derived from reanalysis |
sis-temperature-statistics | Temperature statistics for Europe derived from climate projections |
sis-urban-climate-cities | Climate variables for cities in Europe from 2008 to 2017 |
sis-water-level-change-indicators | Water level change indicators for the European coast from 1977 to 2100 |
sis-water-level-change-timeseries | Water level change time series for the European coast from 1977 to 2100 |
sis-water-quality-swicca | Water quality indicators for European rivers |
sis-water-quantity-swicca | Water quantity indicators for Europe |
Additionally, the Atmosphere Data Store provides five CAMS datasets via the CDS API; they are listed at https://ads.atmosphere.copernicus.eu/cdsapp#!/search?type=dataset. For the other four Copernicus services listed at https://www.copernicus.eu/en (Marine, Land, Security, Emergency), I have not found any public CDS API endpoints.
There are 68 request parameter keys currently available for the various CDS datasets:
algorithm
area
arrival_port
bias_correction
city
dataset
day
definition
departure_port
emissions_scenario
end_year
ensemble_member
ensemble_statistics
epoch
experiment
file_version
forecast_start_month
format
gcm_model
grid_resolution
height_level
horizontal_aggregation
horizontal_resolution
indicator
leadtime_hour
leadtime_month
model
model_levels
month
nominal_day
origin
originating_centre
percentile
period
pressure_level
processing_level
processinglevel
product_type
product_version
projection
rcm_model
reference_dataset
region
return_period
satellite
sea
sensor
sensor_and_algorithm
sensor_on_satellite
simulation_version
soil_level
start_year
stat
statistics
step
system
temporal_aggregation
temporal_resolution
time
time_aggregation
type
type_of_record
type_of_sensor
variable
version
vertical_aggregation
vertical_level
year
Each dataset only supports a small subset of these keys. The subset of
supported keys varies from dataset to dataset, as does the set of
allowed values for each key. Additionally, there can be complex
interdependencies between the parameters (e.g. the available years
depend on the selected product version, or the percentile
key is
only supported for particular values of time_aggregation
). These
constraints can only be determined by manual experimentation in the
web interface.
Depending on the request parameters and the selected dataset, download sizes can range from a few kB to multiple GB. Requests are queued before being processed, and the total time to produce a data file for download depends on both the queue time and the processing time. In many cases data is returned within a few seconds, but in my testing I occasionally encountered much longer wait times (in the most extreme cases, around 25 hours for UERRA regional reanalyses). Queue times can vary widely depending on the current load being experienced by the CDS servers. The Python API does not provide any way to query expected queue or processing times for a request without actually executing it.
Regarding the actual submission of requests, the underlying,
undocumented REST API works asynchronously: after submitting a
request, the client can repeatedly poll its status, then download the
result once it becomes available. However, the Python API library
hides this mechanism behind a synchronous interace: it only exposes a
synchronous retrieve
method, which blocks until the request has been
processed and the data file downloaded.
The great majority of datasets in the Climate Data Store return data in NetCDF or GRIB format, or as a zip or .tar.gz archive of multiple NetCDF or GRIB files. In many cases, NetCDF and GRIB are both offered as options, with NetCDF automatically converted server-side from GRIB (and sometimes marked ‘experimental’). In some cases, requesting NetCDF output produces an error but GRIB files are successfully produced from the same dataset.
In cases where an archive is produced, it usually contains one NetCDF file per variable, or per unique combination of parameters. For instance, a request for data from the ‘Water quantity indicators for Europe’ dataset for two horizontal aggregation levels, two percentile levels, and two emissions scenarios produces an archive containing eight (= 2 × 2 × 2) individual NetCDF files.
A few of the datasets are by their nature not suitable for representation as xcubes – for example, ‘Ship performance along standard shipping routes derived from reanalysis and seasonal forecasts’ and ‘Performance indicators for offshore wind farms in Europe’, which do not contain any geographical co-ordinates.
The API does not offer any information on the exact format of the data
within the returned NetCDF or GRIB files, but many of the dataset web
pages claim conformance with the Climate and Forecast (CF) conventions
at various versions from 1.3 to 1.6. Nevertheless, for each dataset,
it will be necessary to manually examine output files to confirm that
the format can be normalized into an xcube and determine the variable
names and metadata that will be returned by the store plugin's
describe_data
method. For data that are returned as an archive of
multiple files, these files must be unpacked, individually read, and
merged into a single cube, which may be challenging if (for example)
they have differing resolutions.
-
Determine request format and valid parameters by manual experimentation in the web API.
-
Based on the request parameters, write a JSON schema which can be returned by
get_open_data_params_schema
. The JSON schema will usually correspond to a superset of the actual valid parameters, since there are often restrictions on parameters and parameter combinations which are too complex to be representable in a JSON schema. -
Write code to transform the request parameters supplied to the CDS Plugin into the corresponding parameters for the Python CDS API library.
-
Examine output files from the CDS API to determine their structure and naming conventions, and use this information to write a DatasetDescriptor for the xarray Dataset which will be returned from the CDS Plugin.
-
Write code to process the data returned from the CDS API into a normalized xcube, which may involve operations such as combining multiple data files, editing variable metadata, or rasterizing vector data.
For each of the 66 CDS datasets available at
https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset , I
have created a directory within
//fs1/file/home/pont/projects/xcube/cds/datasets/dirs
containing the
following:
-
An
info.yaml
file containing some essential information about the dataset: short identifier (also used as the directory name), description, URL for web interface, output container format, etc. -
A
request.py
file produced using the web interface, containing Python code for an example request via the CDS API library. These requests are constructed to include as many values for as many request parameters as possible, to serve as a partial description of the request syntax and starting point for developing support in the plugin. (In some cases, this means that the request cannot be executed as such because it exceeds the limits on maximum amount of data which may be requested.) -
A data output file in a
data
subdirectory, to serve as an example of the output format and a starting point for the design of a data descriptor and file import code. The file is produced from a request for one or a few variables over a limited temporal and geographical range, in order to demonstrate the file format without producing an excessive amount of data. Nevertheless, the minimum requestable amount of data for some datasets exceeds 1 GB. -
For a few of the datasets, a
notes.txt
file containing additional information.