Central functions to open and normalize gridded datasets #644

forman · 2018-05-10T06:40:06Z

Expected behavior

Implement and use a single function open_gridded_dataset to open gridded datasets read from single files are from multiple files concatenated along time dimension. open_gridded_dataset should

detect (also recursive) file wildcards and expand file list
detect remote file access
detect an appropriate internal chunking based on a configurable strategy. For example
- "spatial": optimal for spatial analysis and visualization, that is chunking in spatial dimension taken from external NetCDF/HDF chunking or GeoTIFF tiling
- "time": optimal for time analysis analysis: chunking mostly along time dimension.
- "cube": optimal for spatio-temporal analyses
open the dataset
perform dataset normalization

The latter should make optional use of another normalize function that can be configured to

rename spatial 1D longitude and latitude coordinates so in the end we have lon and lat coordinate variables
detect a 0-360 degree longitude range and fix it to -180 to +180 degrees by rearranging variable grids
ensure variables have a dimension time and we have a time coordinate variable given that attributes time_coverage_start and time_coverage_end are present
ensure a coordinate variable named time has datatype np.datetime64
ensure global spatio-temporal CF attributes are set

Actual behavior

There are many places in Cate's code where xr.open_dataset() are made without proper parameterization, e.g. appropriate chunking set. This has a major impact on performance and and also data compatibility due to missing normalization.

This is also related to #634, #623.

Specifications

Cate master as of 2018-05-10

The text was updated successfully, but these errors were encountered:

Addresses #644

forman added enhancement api ops ds labels May 10, 2018

forman mentioned this issue May 10, 2018

DataSource.open_dataset() performance and code review #645

Closed

3 tasks

forman self-assigned this Jun 13, 2018

forman added the in_progress label Jun 13, 2018

forman mentioned this issue Jun 13, 2018

ESA sea-level data not correctly displayed #661

Closed

forman added a commit that referenced this issue Jun 13, 2018

Closes #661

a5f2364

Addresses #644

forman mentioned this issue Jun 13, 2018

Revised open_xarray_dataset() #680

Merged

forman removed the in_progress label Jun 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Central functions to open and normalize gridded datasets #644

Central functions to open and normalize gridded datasets #644

forman commented May 10, 2018 •

edited

Loading

Central functions to open and normalize gridded datasets #644

Central functions to open and normalize gridded datasets #644

Comments

forman commented May 10, 2018 • edited Loading

Expected behavior

Actual behavior

Specifications

forman commented May 10, 2018 •

edited

Loading