-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for the new compression arguments. #7551
Conversation
Use a dict for the arguments and update it with the encoding, so all variables are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also fix #7634
And future encoding updates of netcdf.
for more information, see https://pre-commit.ci
Just curious, what is the status of this PR? We have some conditions in our library's tests based on the version of xarray on what compression arguments to pass to xarray. We had hoped this PR would be included in 2023.3.0. |
@djhoese - I think we're just missing tests for this. No reason it can't go out in the next release. |
@markelg Thanks a lot for adding this! Do you have time to finalize it in the near future? If not, I could also take a look at the tests if you like. |
This is currently stuck waiting until the problems with the last netcdf-c versions are fixed in a new release. See the issues (#7388). When they are fixed I will write the tests If I have time. But of course any help and suggestions are welcomed. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I did my best to write a test. Please check it out. It would be nice to test the lossycompression (significant_digits) too but I could use some help for that as I don't have much more time to dedicate to this at the moment. There are a lot of combinations possible with the encoding parameters that netCDF4 accepts, and they change the output file encoding in different ways. The test does not check all of them. Also blosc apparently only works with blosc_shuffle=1, although 0 and 2 are available too according to the documentation. I had to parametrize the shape of the test data in order to increase it in this test, as blosc does not work with small chunk sizes. |
Could someone (@rabernat?) restart the CI here, at least to get viewable logs again? Following #7388 (comment), this seems to be the next step. |
I'm not seeing an option to rerun the jobs, I think due to the age of the last run. I think the best bet would be to merge latest upstream changes from |
I think merging |
Running the tests revealed a bunch of new errors in @markelg - are you available to respond to the review comments and finish up this PR? |
I can do this next week, I am busy until Tuesday. I think I already answered to the comments though. Edit: I see now that I ignored the first one, I will commit the suggested change. I would be nice if someone reviewed the tests. It was hard to do as the compression arguments seem to interact between themselves in complicated ways. |
The good news is that with the exception of |
@markelg Thanks! I've now fully cleaned up my earlier changes, fixed the test-decorator and moved the whats-new entry to correct position. Added one suggestion for the dict-comparison. The last remaining issue is with the windows builds. I've no expertise on windows, so have to rely on other opinions how to resolve. |
I just wanted to thank everyone working on this PR. It seems like a pretty tricky puzzle to put together! It will be great once we have these new compression options available via Xarray. How can we help? |
@rabernat Thanks for stepping by. I've lost track if netcdf-c/netcdf4-python (here the conda-forge builds) have the new compressions compiled in or not. The question is, can xarray do anything about the failing windows runs or not? If not, we could just skip windows in the tests for now. |
I would be in favor of xfailing the windows tests and moving forward with tested support for these new compression options in Linux and MacOS. Packaging NetCDF for windows it probably not our job as Xarray devs. We should raise upstream issues as needed to bring these problems to the attention of whoever maintains those packages. This is a significant improvement for Xarray users and has already been a lot of work for the developers. So we should try to wrap it up asap. |
(cc @WardF @DennisHeimbigner for awareness) |
Is there any specific issue that netcdf-c needs to address? |
It looks like the netCDF windows build in conda-forge does not support the new compression filters. But as others said, I think it is OK to merge this PR and close the issues linked to it. And someone can open a new issue here or upstream to report this problem in windows. |
I'd love to be able to merge this. There are just two tiny mypy errors left before everything is green:
These files were unchanged by this PR. Any idea what is going on here? |
These are already in main. We can merge anyway. |
Thanks for your patience here @markelg! |
* main: Fix mypy type ignore (pydata#8564) Support for the new compression arguments. (pydata#7551) FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552) Adapt map_blocks to use new Coordinates API (pydata#8560) add xeofs to ecosystem.rst (pydata#8561) Offer a fixture for unifying DataArray & Dataset tests (pydata#8533) Generalize cumulative reduction (scan) to non-dask types (pydata#8019)
commit 0a0f800 Merge: 33c8033 41d33f5 Author: Deepak Cherian <[email protected]> Date: Tue Jan 2 20:42:51 2024 -0700 Merge branch 'main' into depr-groupby-squeeze-2 commit 33c8033 Author: Deepak Cherian <[email protected]> Date: Tue Jan 2 20:40:42 2024 -0700 Don't skip for resampling commit d7be352 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed Jan 3 03:24:13 2024 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit d13fa0e Author: Deepak Cherian <[email protected]> Date: Tue Jan 2 20:23:43 2024 -0700 Apply suggestions from code review Co-authored-by: Michael Niklas <[email protected]> commit dd6ea53 Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 19:29:40 2023 -0700 Silence more warnings commit 44e5a41 Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 19:21:06 2023 -0700 minimize test mods commit 94c1c1f Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:55:46 2023 -0700 Add tests for pydata#8263 commit 0ab4eb6 Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:47:41 2023 -0700 Fix typing commit a064430 Merge: d6a3f2d 03ec3cb Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:47:04 2023 -0700 Merge branch 'main' into depr-groupby-squeeze-2 * main: Fix mypy type ignore (pydata#8564) Support for the new compression arguments. (pydata#7551) FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552) Adapt map_blocks to use new Coordinates API (pydata#8560) add xeofs to ecosystem.rst (pydata#8561) Offer a fixture for unifying DataArray & Dataset tests (pydata#8533) Generalize cumulative reduction (scan) to non-dask types (pydata#8019) commit d6a3f2d Author: Deepak Cherian <[email protected]> Date: Thu Dec 21 18:46:50 2023 -0700 Fix generator for aggregations commit 97f1695 Author: Deepak Cherian <[email protected]> Date: Tue Dec 19 10:58:11 2023 -0700 Fix docs commit 5b33b98 Author: Deepak Cherian <[email protected]> Date: Sun Dec 17 20:35:53 2023 -0700 fix whats-new commit 80b2b36 Author: Deepak Cherian <[email protected]> Date: Sun Dec 17 20:26:17 2023 -0700 Reduce more warnings commit 5f6f4ea Merge: a57d4ae 2971994 Author: Deepak Cherian <[email protected]> Date: Sat Dec 16 20:33:13 2023 -0700 Merge branch 'main' into depr-groupby-squeeze-2 * main: (26 commits) Filter null values before plotting (pydata#8535) Update concat.py (pydata#8538) Add getitem to array protocol (pydata#8406) Added option to specify weights in xr.corr() and xr.cov() (pydata#8527) Filter out doctest warning (pydata#8539) Bump actions/setup-python from 4 to 5 (pydata#8540) Point users to where in their code they should make mods for Dataset.dims (pydata#8534) Add Cumulative aggregation (pydata#8512) dev whats-new Whats-new for 2023.12.0 (pydata#8532) explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526) Add `eval` method to Dataset (pydata#7163) Deprecate ds.dims returning dict (pydata#8500) test and fix empty xindexes repr (pydata#8521) Remove PR labeler bot (pydata#8525) Hypothesis strategy for generating Variable objects (pydata#8404) Use numbagg for `rolling` methods (pydata#8493) Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514) fix RTD docs build (pydata#8519) Fix type of `.assign_coords` (pydata#8495) ... commit a57d4ae Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 21:36:04 2023 -0700 Test one more warning commit bf8139d Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 21:33:45 2023 -0700 Update xarray/tests/test_groupby.py commit 4e9a063 Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 21:10:14 2023 -0700 Set squeeze=None for Dataset too commit c2e576e Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 20:54:17 2023 -0700 Fix first, last commit 6d8e822 Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 20:46:21 2023 -0700 better warning commit 62c334b Author: Deepak Cherian <[email protected]> Date: Fri Dec 1 20:45:17 2023 -0700 silence warnings commit b7805a8 Author: dcherian <[email protected]> Date: Tue Aug 15 10:54:25 2023 -0600 Deprecate `squeeze` in GroupBy. Closes pydata#2157
* upstream/main: Faster encoding functions. (pydata#8565) ENH: vendor SerializableLock from dask and use as default backend lock, adapt tests (pydata#8571) Silence a bunch of CachingFileManager warnings (pydata#8584) Bump actions/download-artifact from 3 to 4 (pydata#8556) Minimize duplication in `map_blocks` task graph (pydata#8412) [pre-commit.ci] pre-commit autoupdate (pydata#8578) ignore a `DeprecationWarning` emitted by `seaborn` (pydata#8576) Fix mypy type ignore (pydata#8564) Support for the new compression arguments. (pydata#7551) FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
Use a dict for the arguments and update it with the encoding, so all variables are passed.
whats-new.rst