Attributes from netCDF4 intialization retained #1038

pwolfram · 2016-10-04T23:51:48Z

Ensures that attrs for open_mfdataset are now retained

shoyer · 2016-10-05T00:00:30Z

Merge logic for attributes opens a whole big can of worms. I would probably just copy attributes from the first dataset (similar to what we do in concat), unless you want to overhaul the whole thing in a more comprehensive fashion.

pwolfram · 2016-10-05T02:14:53Z

@shoyer, it sounds like provenance of data is an outstanding problem long-term. I'm happy to just copy attributes from the first dataset but am wondering what it would take to do this correctly, i.e., the "overhaul". Any information you have on this would be really helpful. At a minimum we can do as you suggest to fix the lack of attributes (#1037).

pwolfram · 2016-10-05T03:09:38Z

@shoyer, I did some more digging and see some of the potential issues because some of the concatenation / merging is done quasi-automatically, which reduces the number of objects that must be merged (e.g., https://github.com/pydata/xarray/blob/master/xarray/core/combine.py#L391). I'm assuming this is done for performance / simplicity. Is that true?

This is looking like a much larger piece of work as I look at this further because the information has already been compressed by the time the merge is called (i.e., len(dict_like_objects) is not necessarily equal to the number of input files https://github.com/pydata/xarray/blob/master/xarray/core/merge.py#L531).

shoyer · 2016-10-05T15:50:06Z

I did some more digging and see some of the potential issues because some of the concatenation / merging is done quasi-automatically, which reduces the number of objects that must be merged (e.g., https://github.com/pydata/xarray/blob/master/xarray/core/combine.py#L391). I'm assuming this is done for performance / simplicity. Is that true?

We have two primitive combine operations, concat (same variables, different coordinate values) and merge (different variables, same coordinate values). auto_combine needs to do both in some order.

You're right that the order of grouped is not deterministic (it uses a dict). Sorting by key for input into the list comprehension could fix that.

The comprehensive fix would be to pick a merge strategy for attributes, and apply it uniformly in each place where xarray merges variables or datasets (basically, in concat and all the merge variations). Possibly several merge strategies, with a keyword argument to switch between them.

fmaussion · 2016-12-16T20:03:10Z

AFAIC I'd be happy with a combined.attrs = datasets[0].attrs added before returning the combined dataset which is already better than the current situation...

Do you have time to get back to this @pwolfram ?

pwolfram · 2017-03-22T14:52:34Z

@fmaussion and @shoyer, I'd like to close this PR out if possible. I'm not 100% sure this PR is worthwhile to complete in a general fashion because of the ambiguity in how to best handle this issue. My current take on this would be to go with whatever is simplest / cleanest, at least in the short term, which is @fmaussion's suggestion above. Does this work for you both?

fmaussion · 2017-03-22T14:57:54Z

Yes, that's good for me. I would mention it somewhere in the docstring though.

pwolfram · 2017-03-22T15:19:45Z

Note, I would say that open_mfdataset is no longer experimental because of its widespread use.

pwolfram · 2017-03-22T15:21:07Z

Provided checks pass this should be ready to merge @fmaussion unless @shoyer has any additional recommended changes.

fmaussion · 2017-03-22T15:25:45Z

Note, I would say that open_mfdataset is no longer experimental because of its widespread use.

Yes, I also recently updated the IO docs in this respect and removed the experimental part: http://xarray.pydata.org/en/latest/io.html#id6

shoyer · 2017-03-22T16:26:02Z

Yes, this works for me. Can you add a test case that covers this?

pwolfram · 2017-03-24T16:58:43Z

@shoyer, added a test as requested.

Uses attributes from first file opened by `open_mfdataset` to populate ds.attrs.

shoyer · 2017-03-24T18:31:36Z

It looks like one of the new many files tests is crashing:

xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_3_open_large_num_files_pynio /home/travis/build.sh: line 62: 1561 Segmentation fault (core dumped) py.test xarray --cov=xarray --cov-report term-missing --verbose

https://travis-ci.org/pydata/xarray/jobs/214722901

fmaussion · 2017-03-24T18:33:01Z

Yes, it also happened on this PR: #1328

pwolfram · 2017-03-24T18:34:13Z

It happened here too... I just tried it out on my local machine via conda env create -f ci/requirements-py27-cdat+pynio.yml and wasn't able to get an error... are any of the crashes better then a "seg fault"?

shoyer · 2017-03-24T18:35:56Z

@pwolfram if we're getting sporadic failures on Travis, it's probably better to skip the test by default. It's important for the test suite not be flakey.

pwolfram · 2017-03-24T18:36:35Z

@shoyer, should I do a quick "hot fix" and then try to sort out the problem?

pwolfram · 2017-03-24T18:46:49Z

I'm continuing to take a look-- my tests were not 100% set up locally on this branch and I'll see if I can reproduce the sporadic error on macOS.

pwolfram · 2017-03-24T18:52:07Z

Still passing locally...

xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_1_autoclose_netcdf4 PASSED
xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_1_open_large_num_files_netcdf4 PASSED
xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_2_autoclose_scipy PASSED
xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_2_open_large_num_files_scipy PASSED
xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_3_autoclose_pynio PASSED
xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_3_open_large_num_files_pynio PASSED

Test passes even if I run it multiple times too.

shoyer · 2017-03-24T18:54:24Z

Travis is a shared environment that runs multiple tests concurrently. It's possible that we're running out of files due to other users or even other variants of our same build.

pwolfram · 2017-03-24T18:54:45Z

Is it possible that the test fails if more than one is simultaneously run on the same node? Could you restart the other tests to verify (restart at the same time if possible).

shoyer · 2017-03-24T18:55:41Z

Just restarted, let's see...

pwolfram · 2017-03-24T19:04:42Z

Crash in the same place... but when I restarted it via a force push earlier it passed, which would imply we are running out of resources on travis.

Maybe the thing to do is just to do a reset on the open file limit as @rabernat suggested, this way it provides a factor of safety on travis.

Thoughts on this idea @shoyer and @fmaussion?

pwolfram · 2017-03-28T19:07:50Z

See #1336 for a fix that disables these tests that have been acting up because of resource issues.

pwolfram · 2017-03-31T02:53:30Z

@shoyer, tests should be restarted following merge of #1336 and this PR should be ready to merge.

shoyer · 2017-03-31T03:11:00Z

OK, going to merge this anyways... the failing tests will be fixed by #1366

pwolfram mentioned this pull request Oct 4, 2016

attrs empty for open_mfdataset vs population for open_dataset #1037

Closed

pwolfram force-pushed the mfdataset_attrs branch from ed0025c to 9fa827d Compare March 22, 2017 15:14

pwolfram force-pushed the mfdataset_attrs branch 2 times, most recently from 4d9fd6b to 76978d7 Compare March 24, 2017 16:54

Ensures open_mfdataset attributes are retained

0d183c5

Uses attributes from first file opened by `open_mfdataset` to populate ds.attrs.

pwolfram force-pushed the mfdataset_attrs branch from 76978d7 to 0d183c5 Compare March 24, 2017 18:31

mzuehlke mentioned this pull request Mar 30, 2017

Missing global attributes in multi-file netCDF Datasets CCI-Tools/cate#174

Closed

pwolfram mentioned this pull request Mar 30, 2017

Marks slow, flaky, and failing tests #1336

Merged

shoyer merged commit c0178b7 into pydata:master Mar 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes from netCDF4 intialization retained #1038

Attributes from netCDF4 intialization retained #1038

pwolfram commented Oct 4, 2016

shoyer commented Oct 5, 2016

pwolfram commented Oct 5, 2016

pwolfram commented Oct 5, 2016

shoyer commented Oct 5, 2016

fmaussion commented Dec 16, 2016

pwolfram commented Mar 22, 2017

fmaussion commented Mar 22, 2017

pwolfram commented Mar 22, 2017

pwolfram commented Mar 22, 2017

fmaussion commented Mar 22, 2017

shoyer commented Mar 22, 2017

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

fmaussion commented Mar 24, 2017

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

pwolfram commented Mar 24, 2017

pwolfram commented Mar 24, 2017 •

edited

Loading

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

pwolfram commented Mar 24, 2017

pwolfram commented Mar 28, 2017

pwolfram commented Mar 31, 2017

shoyer commented Mar 31, 2017

Attributes from netCDF4 intialization retained #1038

Attributes from netCDF4 intialization retained #1038

Conversation

pwolfram commented Oct 4, 2016

shoyer commented Oct 5, 2016

pwolfram commented Oct 5, 2016

pwolfram commented Oct 5, 2016

shoyer commented Oct 5, 2016

fmaussion commented Dec 16, 2016

pwolfram commented Mar 22, 2017

fmaussion commented Mar 22, 2017

pwolfram commented Mar 22, 2017

pwolfram commented Mar 22, 2017

fmaussion commented Mar 22, 2017

shoyer commented Mar 22, 2017

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

fmaussion commented Mar 24, 2017

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

pwolfram commented Mar 24, 2017

pwolfram commented Mar 24, 2017 • edited Loading

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

pwolfram commented Mar 24, 2017

shoyer commented Mar 24, 2017

pwolfram commented Mar 24, 2017

pwolfram commented Mar 28, 2017

pwolfram commented Mar 31, 2017

shoyer commented Mar 31, 2017

pwolfram commented Mar 24, 2017 •

edited

Loading