-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributes from netCDF4 intialization retained #1038
Conversation
Merge logic for attributes opens a whole big can of worms. I would probably just copy attributes from the first dataset (similar to what we do in |
@shoyer, it sounds like provenance of data is an outstanding problem long-term. I'm happy to just copy attributes from the first dataset but am wondering what it would take to do this correctly, i.e., the "overhaul". Any information you have on this would be really helpful. At a minimum we can do as you suggest to fix the lack of attributes (#1037). |
@shoyer, I did some more digging and see some of the potential issues because some of the concatenation / merging is done quasi-automatically, which reduces the number of objects that must be merged (e.g., https://github.com/pydata/xarray/blob/master/xarray/core/combine.py#L391). I'm assuming this is done for performance / simplicity. Is that true? This is looking like a much larger piece of work as I look at this further because the information has already been compressed by the time the |
We have two primitive combine operations, You're right that the order of The comprehensive fix would be to pick a merge strategy for attributes, and apply it uniformly in each place where xarray merges variables or datasets (basically, in |
AFAIC I'd be happy with a Do you have time to get back to this @pwolfram ? |
@fmaussion and @shoyer, I'd like to close this PR out if possible. I'm not 100% sure this PR is worthwhile to complete in a general fashion because of the ambiguity in how to best handle this issue. My current take on this would be to go with whatever is simplest / cleanest, at least in the short term, which is @fmaussion's suggestion above. Does this work for you both? |
Yes, that's good for me. I would mention it somewhere in the docstring though. |
ed0025c
to
9fa827d
Compare
Note, I would say that |
Provided checks pass this should be ready to merge @fmaussion unless @shoyer has any additional recommended changes. |
Yes, I also recently updated the IO docs in this respect and removed the experimental part: http://xarray.pydata.org/en/latest/io.html#id6 |
Yes, this works for me. Can you add a test case that covers this? |
4d9fd6b
to
76978d7
Compare
@shoyer, added a test as requested. |
Uses attributes from first file opened by `open_mfdataset` to populate ds.attrs.
76978d7
to
0d183c5
Compare
It looks like one of the new many files tests is crashing: xarray/tests/test_backends.py::OpenMFDatasetManyFilesTest::test_3_open_large_num_files_pynio /home/travis/build.sh: line 62: 1561 Segmentation fault (core dumped) py.test xarray --cov=xarray --cov-report term-missing --verbose |
Yes, it also happened on this PR: #1328 |
It happened here too... I just tried it out on my local machine via |
@pwolfram if we're getting sporadic failures on Travis, it's probably better to skip the test by default. It's important for the test suite not be flakey. |
@shoyer, should I do a quick "hot fix" and then try to sort out the problem? |
I'm continuing to take a look-- my tests were not 100% set up locally on this branch and I'll see if I can reproduce the sporadic error on macOS. |
Still passing locally...
Test passes even if I run it multiple times too. |
Travis is a shared environment that runs multiple tests concurrently. It's possible that we're running out of files due to other users or even other variants of our same build. |
Is it possible that the test fails if more than one is simultaneously run on the same node? Could you restart the other tests to verify (restart at the same time if possible). |
Just restarted, let's see... |
Crash in the same place... but when I restarted it via a force push earlier it passed, which would imply we are running out of resources on travis. Maybe the thing to do is just to do a reset on the open file limit as @rabernat suggested, this way it provides a factor of safety on travis. Thoughts on this idea @shoyer and @fmaussion? |
See #1336 for a fix that disables these tests that have been acting up because of resource issues. |
OK, going to merge this anyways... the failing tests will be fixed by #1366 |
Ensures that attrs for open_mfdataset are now retained
cc @shoyer