"Dimension time already exists" error when opening aerosol dataset #634

HelenClifton · 2018-05-02T15:32:44Z

Expected behavior

@forman, @JanisGailis Executing use case 22 happy path with cate GUI
Downloaded dataset esacci.AEROSOL.mon.L3C.AER_PRODUCTS.AATSR.Envisat.ORAC.03-02.r1
Expected it to open

Actual behavior

An error message is seen

Traceback here:
uc22_time_error.txt

Notes:

From the cate CLI, the dataset will be downloaded without error (command cate ds copy) but the error occurs when it is opened (command cate res open)
I was using a clean install of cate-2.0.0-dev.10 (not the auto-updated version)
Downloading and opening of this dataset with the dev.9 had worked.
It may be related to Generate time dim if it doesn't exist #623

Steps to reproduce the problem

Select data store: ESA CCI Open Data Portal
Highlight data source: esacci.AEROSOL.mon.L3C.AER_PRODUCTS.AATSR.Envisat.ORAC.03-02.r1
Click on “Download and/or open remote dataset”
In “Download Data Source” window:
• Select Time Constraint and enter start time of 2010-01-01 and end time of 2010-12-31
• Select Region Constraint, enter:
o Lon. from: -100
o Lon. from: 80
o Lat. from: -90
o Lat. to: 90
• Select Variables Constraint, select variable AOD550_mean from the list
• Select “Download and make local data source“ and enter Unique identifier UC22_aerosol
• Click on “Download & Open Local”

Specifications

Windows 7 Professional
cate-2.0.0-dev.10

forman · 2018-05-03T17:48:22Z

It is related to #623.

forman · 2018-05-03T18:58:15Z

@papesci, it is this piece of code that I marked with a TODO in my branch that causes this. Obviously @kbernat added these lines once to fix the "missing time dimension" problem but instead of really fixing it, we now have a "dead" time dimension in all of our localized datasets without the actual time coordinate variable in there. Aaargh! 👎

@papesci, @JanisGailis we have to review how we normalize datasets opened by open_dataset in the future. I am strongly voting for doing any normalisation by default, including adding a missing time coordinate variable / dimension plus, plus correcting (see #620) and renaming lat/lon coordinate variables / dimensions. The current situation is less than unsatisfying, we have operations open_dataset, normalize, adjust_spatial_attrs, adjust_temporal_attrs, fix_lon_360 that all do some kind of normalisation (see also #555 for single grid cells with lat/lon dimensions retained).

JanisGailis · 2018-05-03T20:08:19Z

I completely agree that all normalization must be done by default when the dataset is opened.

However, I don't see a problem with splitting normalization routines in many smaller methods doing a single thing. For one, I actually need adjust_spatial_attrs and adjust_temporal_attrs as there are quite a few operations that can potentially change spatiotemporal parameters of a dataset, in which case I need to rerun some parts of the normalization, such as adjusting the attributes, so that the dataset stays normalized with up to date attributes. Running some generic normalize that does many other things aside from what I need would not be ideal. A different matter is if we need to have separate operation interfaces, or if I should just use _opimpl instead. But then, I don't think it hurts to have these as util operations.

papesci · 2018-05-04T11:16:31Z

Yes indeed, we have to support the user as much as possible making easier its work. This mean also executing automatically all the phases preparatory to access data.
By the way we have to deal also with dataset that are not normalizable, for example DS with no lat or log dimension, in this case i think the system should always open the dataset with a warning notifying the user his about to access a DS with a unrecognized structure.

forman · 2018-05-07T11:48:40Z

@JanisGailis I agree as long as you are referring to software design - compose generic normalize() operation from of smaller specialized operations each useful on its own. That's absolutely fine. Then normalize() should have a set of kwargs that control the behaviour, e.g. gen_missing_time_dim=True.

However, I was referring to the individual normalisation steps that should happen in Cate's open_dataset() so that a maximum number of operations can be applied without forcing users to add intermediate normalisation steps. By default, open_dataset() should apply the most common normalisations that we are aware off so that we have a guarantee that if there are time coordinates, there is a 1-D variable "time" of type datetime64 with the dimension "time". And if there are longitude coordinates, there is a 1-D variable "lon" etc and that corresponding CF attributes are set correctly.

Leaving it up to the users to decide which normalisation to apply to any dataset before another operation can be used is very unfriendly and frustrating for them. This is what I can report from a 2-days workshop at the climate office last week. We cannot expect that users know what's in a dataset before they can open it and inspect it. Otherwise we would need to provide a per-ECV handbook with the software that explains which kind of normalisation should be applied to each dataset.

forman · 2018-05-07T13:12:53Z

This is the current confusing situation (as of 07.05.2018):

normalize is called
- in open_dataset after cate.core.ds.open_dataset, only if the normalize kwarg is set (default is True)
- in read_netcdf after xr.open_dataset, only if the normalize kwarg is set (default is True)
normalize_impl is always called
- in EsaCciOdpDataSource.open_dataset after open_xarray_dataset and before subset_spatial_impl
- in EsaCciOdpDataSource._make_local after xr.open_dataset and before subset_spatial_impl
- in LocalDataSource.open_dataset and LocalDataSource._make_local for the same reason as above, btw. with a lot of unbeatiful code duplication.
adjust_spatial_attrs_impl is always called
- as a post-processor in EsaCciOdpDataSource._make_local after subset_spatial_impl
- as a post-processor in LocalDataSource._make_local after subset_spatial_impl (code duplication!)
adjust_spatial_attrs is always called
- as a post-processor in coregister
- as a post-processor in pearson_correlation
- as a post-processor in subset_spatial
adjust_temporal_attrs_impl is always used
- as a post-processor in DatasetLike.convert after converting a DataFrame to Dataset
adjust_temporal_attrs is called
- as a post-processor after(?) normalize in open_dataset and read_netcdf, if their normalize kwarg is set
- always as a post-processor in temporal_aggregation
- always as a post-processorin subset_temporal

kbernat · 2018-05-07T23:12:40Z

@forman @papesci
I'm happy to help if you want, even if it's just a refactoring orremoving some code duplication.
Give me a shout with your thoughts.

forman · 2018-05-08T18:59:40Z

@kbernat Thanks Chris, much appreciated.

forman · 2018-05-08T19:00:20Z

This issue should be fixed after last commit.

HelenClifton · 2018-05-29T16:00:32Z

@forman , @JanisGailis , @papesci
Confirmed, it is fixed in dev.11.
Closing the issue.

forman self-assigned this May 3, 2018

forman added bug regression in_progress labels May 3, 2018

forman added a commit that referenced this issue May 3, 2018

possible fix for #634

b3d117a

forman removed the in_progress label May 8, 2018

forman mentioned this issue May 15, 2018

Central functions to open and normalize gridded datasets #644

Open

HelenClifton closed this as completed May 29, 2018

HelenClifton mentioned this issue Jun 7, 2018

"Write to read-only" errors when downloading datasets #667

Closed

This was referenced Sep 26, 2018

Coregistration failure with some Aerosol product types #770

Closed

Time coordinates not correctly decoded for some Aerosol products #771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Dimension time already exists" error when opening aerosol dataset #634

"Dimension time already exists" error when opening aerosol dataset #634

HelenClifton commented May 2, 2018 •

edited

Loading

forman commented May 3, 2018

forman commented May 3, 2018

JanisGailis commented May 3, 2018

papesci commented May 4, 2018

forman commented May 7, 2018

forman commented May 7, 2018 •

edited

Loading

kbernat commented May 7, 2018 •

edited

Loading

forman commented May 8, 2018

forman commented May 8, 2018

HelenClifton commented May 29, 2018

"Dimension time already exists" error when opening aerosol dataset #634

"Dimension time already exists" error when opening aerosol dataset #634

Comments

HelenClifton commented May 2, 2018 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce the problem

Specifications

forman commented May 3, 2018

forman commented May 3, 2018

JanisGailis commented May 3, 2018

papesci commented May 4, 2018

forman commented May 7, 2018

forman commented May 7, 2018 • edited Loading

kbernat commented May 7, 2018 • edited Loading

forman commented May 8, 2018

forman commented May 8, 2018

HelenClifton commented May 29, 2018

HelenClifton commented May 2, 2018 •

edited

Loading

forman commented May 7, 2018 •

edited

Loading

kbernat commented May 7, 2018 •

edited

Loading