Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attrs empty for open_mfdataset vs population for open_dataset #1037

Closed
pwolfram opened this issue Oct 4, 2016 · 4 comments
Closed

attrs empty for open_mfdataset vs population for open_dataset #1037

pwolfram opened this issue Oct 4, 2016 · 4 comments
Labels

Comments

@pwolfram
Copy link
Contributor

pwolfram commented Oct 4, 2016

Previously, a dataset would store attrs corresponding to netCDF global attributes. For some reason, this behavior does not appear to be supported anymore. Using this dataset: https://github.com/pydata/xarray-data/raw/master/rasm.nc

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('rasm.nc')
/Users/pwolfram/src/xarray/xarray/conventions.py:386: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range
  result = decode_cf_datetime(example_value, units, calendar)

In [3]: ds
Out[3]: 
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16T12:00:00 1980-10-17 ...
    yc       (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
    xc       (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    title: /workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc
    institution: U.W.
    source: RACM R1002RBRxaaa01a
    output_frequency: daily
    output_mode: averaged
    convention: CF-1.4
    references: Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429.
    comment: Output from the Variable Infiltration Capacity (VIC) model.
    nco_openmp_thread_number: 1
    NCO: 4.3.7
    history: history deleted for brevity

In [4]: ds = xr.open_mfdataset('rasm.nc')

In [5]: ds
Out[5]: 
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16T12:00:00 1980-10-17 ...
    yc       (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
    xc       (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...

The attributes for open_mfdataset are missing whereas in previous versions of xarray I do not believe that this was the case because one of my scripts is failing because it does not obtain attributes when using the open_mfdataset initialization.

@shoyer and @jhamman, is this the expected behavior and was the prior behavior simply an unspecified side-effect of the code vs a design decision? My preference would be to keep as many attributes as possible when using open_mfdataset to best provenance the dataset, i.e., ds.attrs should not be empty following initialization.

@shoyer
Copy link
Member

shoyer commented Oct 4, 2016

This was certainly not an intentional change, but on the other hand, the original behavior was not tested or intentionally specified, either.

It would be good to figure out where in open_mfdataset/auto_concat the attributes are lost.

@pwolfram
Copy link
Contributor Author

pwolfram commented Oct 4, 2016

@shoyer, the issue is at merge of xarray/core/merge.py. Essentially the lines

variables, coord_names, dims = merge_core(dict_like_objects, compat, join)
merged = Dataset._construct_direct(variables, coord_names, dims)

drop the attributes. The solution is to do something like a merge of the attrs, e.g., a merge of the OrderedDicts. I've started a solution at #1038 if you want to take a preliminary look.

@stale
Copy link

stale bot commented Jan 26, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jan 26, 2019
@jhamman
Copy link
Member

jhamman commented Feb 2, 2019

This should have been closed by #1038.

@jhamman jhamman closed this as completed Feb 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants