Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update xcube-cds to work with new CDS backend API #84

Closed
pont-us opened this issue Jul 25, 2024 · 2 comments · Fixed by #85
Closed

Update xcube-cds to work with new CDS backend API #84

pont-us opened this issue Jul 25, 2024 · 2 comments · Fixed by #85
Assignees

Comments

@pont-us
Copy link
Member

pont-us commented Jul 25, 2024

The current CDS API servers are scheduled to be shut down on 2024-09-03. As of 2024-07-25, ECMWF has not yet deployed any production-grade replacement, but there is a beta version of the planned successor at https://cds-beta.climate.copernicus.eu/. xcube-cds will need to support this new version before the current version is shut down.

More information:

@pont-us
Copy link
Member Author

pont-us commented Aug 22, 2024

The main difficulty is the change in ERA5. Previously any ERA5 request to the cdsapi backend would produce a single NetCDF file. Now, some requests produce a Zip containing multiple NetCDFs. ECMWF tech support clarified this behaviour as follows:

…whenever there a GRIB to NetCDF incompatibility is detected, the output NetCDF file will be split and delivered as a compressed zipped file (with the .zip extension). An example of incompatibility is requesting atmospheric and oceanic variables at the same time.

Here's a fairly minimal example demonstrating this behaviour with the new API:

import cdsapi

dataset = "reanalysis-era5-single-levels-monthly-means"
request = {
    'product_type': ['monthly_averaged_reanalysis'],
    'variable': ['2m_temperature', 'mean_wave_direction'],
    'year': ['2015'],
    'month': ['10'],
    'time': ['00:00'],
    'data_format': 'netcdf',
    'area': [1, -1, -1, 1]
}

client = cdsapi.Client()
client.retrieve(dataset, request).download()

This produces a Zip file containing two NetCDFs, each containing one of the requested variables. The NetCDFs have different resolutions, but this isn't always the case: sometimes a Zip is produced containing multiple NetCDFs with identical dimensions. The equivalent code for the old API is:

import cdsapi

dataset = "reanalysis-era5-single-levels-monthly-means"
request = {
    'product_type': ['monthly_averaged_reanalysis'],
    'variable': ['2m_temperature', 'mean_wave_direction'],
    'year': ['2015'],
    'month': ['10'],
    'time': ['00:00'],
    'data_format': 'netcdf',
    'area': [1, -1, -1, 1]
}

client = cdsapi.Client()
client.retrieve(dataset, request).download()

This produces a single NetCDF on a common grid.

@pont-us
Copy link
Member Author

pont-us commented Aug 22, 2024

Sketch of a solution for ERA5:

  • Detect whether the downloaded data is NetCDF or Zip. Currently NetCDF is simply assumed.
  • If NetCDF, follow existing code path
  • If Zip,
    • Unpack Zip into a temporary directory
    • Check if all NetCDFs have identical dimensions
      • If so, merge them into one dataset with xarray.open_mfdataset
      • If not, raise an exception

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant