Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the new compression arguments. #7551

Merged
merged 37 commits into from
Dec 21, 2023

Conversation

markelg
Copy link
Contributor

@markelg markelg commented Feb 23, 2023

Use a dict for the arguments and update it with the encoding, so all variables are passed.

Use a dict for the arguments and update it with the encoding, so all variables are passed.
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also fix #7634
And future encoding updates of netcdf.

xarray/backends/netCDF4_.py Outdated Show resolved Hide resolved
xarray/backends/netCDF4_.py Show resolved Hide resolved
@djhoese
Copy link
Contributor

djhoese commented Mar 24, 2023

Just curious, what is the status of this PR? We have some conditions in our library's tests based on the version of xarray on what compression arguments to pass to xarray. We had hoped this PR would be included in 2023.3.0.

@jhamman
Copy link
Member

jhamman commented Mar 27, 2023

@djhoese - I think we're just missing tests for this. No reason it can't go out in the next release.

@sfinkens
Copy link

@markelg Thanks a lot for adding this! Do you have time to finalize it in the near future? If not, I could also take a look at the tests if you like.

@garciampred
Copy link
Contributor

garciampred commented May 24, 2023

This is currently stuck waiting until the problems with the last netcdf-c versions are fixed in a new release. See the issues (#7388).

When they are fixed I will write the tests If I have time. But of course any help and suggestions are welcomed.

@markelg
Copy link
Contributor Author

markelg commented Jun 21, 2023

I did my best to write a test. Please check it out. It would be nice to test the lossycompression (significant_digits) too but I could use some help for that as I don't have much more time to dedicate to this at the moment.

There are a lot of combinations possible with the encoding parameters that netCDF4 accepts, and they change the output file encoding in different ways. The test does not check all of them. Also blosc apparently only works with blosc_shuffle=1, although 0 and 2 are available too according to the documentation. I had to parametrize the shape of the test data in order to increase it in this test, as blosc does not work with small chunk sizes.

@zklaus
Copy link

zklaus commented Sep 28, 2023

Could someone (@rabernat?) restart the CI here, at least to get viewable logs again? Following #7388 (comment), this seems to be the next step.

@rabernat
Copy link
Contributor

I'm not seeing an option to rerun the jobs, I think due to the age of the last run.

I think the best bet would be to merge latest upstream changes from main (needed anyway to move forward) and push to this branch, which will trigger a new CI run.

@zklaus
Copy link

zklaus commented Sep 28, 2023

I think merging main was the right thing to do anyways. For the future: If you just close and reopen the PR, the CI will be restarted.

@rabernat
Copy link
Contributor

Running the tests revealed a bunch of new errors in test_combine.py. I wonder if these are really related to this PR.

@markelg - are you available to respond to the review comments and finish up this PR?

@markelg
Copy link
Contributor Author

markelg commented Sep 28, 2023

Running the tests revealed a bunch of new errors in test_combine.py. I wonder if these are really related to this PR.

@markelg - are you available to respond to the review comments and finish up this PR?

I can do this next week, I am busy until Tuesday. I think I already answered to the comments though. Edit: I see now that I ignored the first one, I will commit the suggested change.

I would be nice if someone reviewed the tests. It was hard to do as the compression arguments seem to interact between themselves in complicated ways.

@zklaus
Copy link

zklaus commented Sep 28, 2023

The good news is that with the exception of ubuntu-latest/py3.9 min-all-deps, no libnetcdf compression errors are showing up, right? That test pulls in an older version of the library, probably due to some dependency issues?

@kmuehlbauer
Copy link
Contributor

@markelg Thanks! I've now fully cleaned up my earlier changes, fixed the test-decorator and moved the whats-new entry to correct position. Added one suggestion for the dict-comparison.

The last remaining issue is with the windows builds. I've no expertise on windows, so have to rely on other opinions how to resolve.

xarray/tests/__init__.py Outdated Show resolved Hide resolved
@rabernat
Copy link
Contributor

I just wanted to thank everyone working on this PR. It seems like a pretty tricky puzzle to put together! It will be great once we have these new compression options available via Xarray.

How can we help?

@kmuehlbauer
Copy link
Contributor

@rabernat Thanks for stepping by.

I've lost track if netcdf-c/netcdf4-python (here the conda-forge builds) have the new compressions compiled in or not. The question is, can xarray do anything about the failing windows runs or not?

If not, we could just skip windows in the tests for now.

@rabernat
Copy link
Contributor

rabernat commented Dec 20, 2023

I would be in favor of xfailing the windows tests and moving forward with tested support for these new compression options in Linux and MacOS. Packaging NetCDF for windows it probably not our job as Xarray devs. We should raise upstream issues as needed to bring these problems to the attention of whoever maintains those packages.

This is a significant improvement for Xarray users and has already been a lot of work for the developers. So we should try to wrap it up asap.

@dopplershift
Copy link
Contributor

(cc @WardF @DennisHeimbigner for awareness)

@DennisHeimbigner
Copy link

Is there any specific issue that netcdf-c needs to address?

@markelg
Copy link
Contributor Author

markelg commented Dec 21, 2023

Is there any specific issue that netcdf-c needs to address?

It looks like the netCDF windows build in conda-forge does not support the new compression filters. But as others said, I think it is OK to merge this PR and close the issues linked to it. And someone can open a new issue here or upstream to report this problem in windows.

@rabernat
Copy link
Contributor

I'd love to be able to merge this. There are just two tiny mypy errors left before everything is green:

xarray/core/dataset.py:174: error: Unused "type: ignore" comment  [unused-ignore]
xarray/core/dataarray.py:83: error: Unused "type: ignore" comment  [unused-ignore]

These files were unchanged by this PR. Any idea what is going on here?

@headtr1ck
Copy link
Collaborator

I'd love to be able to merge this. There are just two tiny mypy errors left before everything is green:

xarray/core/dataset.py:174: error: Unused "type: ignore" comment  [unused-ignore]
xarray/core/dataarray.py:83: error: Unused "type: ignore" comment  [unused-ignore]

These files were unchanged by this PR. Any idea what is going on here?

These are already in main. We can merge anyway.

@dcherian dcherian merged commit a04900d into pydata:main Dec 21, 2023
25 of 27 checks passed
@dcherian
Copy link
Contributor

Thanks for your patience here @markelg!

dcherian added a commit to dcherian/xarray that referenced this pull request Dec 22, 2023
* main:
  Fix mypy type ignore (pydata#8564)
  Support for the new compression arguments. (pydata#7551)
  FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
  Adapt map_blocks to use new Coordinates API (pydata#8560)
  add xeofs to ecosystem.rst (pydata#8561)
  Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
  Generalize cumulative reduction (scan) to non-dask types (pydata#8019)
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 4, 2024
commit 0a0f800
Merge: 33c8033 41d33f5
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:42:51 2024 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

commit 33c8033
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:40:42 2024 -0700

    Don't skip for resampling

commit d7be352
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Jan 3 03:24:13 2024 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit d13fa0e
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:23:43 2024 -0700

    Apply suggestions from code review

    Co-authored-by: Michael Niklas  <[email protected]>

commit dd6ea53
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 19:29:40 2023 -0700

    Silence more warnings

commit 44e5a41
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 19:21:06 2023 -0700

    minimize test mods

commit 94c1c1f
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:55:46 2023 -0700

    Add tests for pydata#8263

commit 0ab4eb6
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:47:41 2023 -0700

    Fix typing

commit a064430
Merge: d6a3f2d 03ec3cb
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:47:04 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main:
      Fix mypy type ignore (pydata#8564)
      Support for the new compression arguments. (pydata#7551)
      FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
      Adapt map_blocks to use new Coordinates API (pydata#8560)
      add xeofs to ecosystem.rst (pydata#8561)
      Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
      Generalize cumulative reduction (scan) to non-dask types (pydata#8019)

commit d6a3f2d
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:46:50 2023 -0700

    Fix generator for aggregations

commit 97f1695
Author: Deepak Cherian <[email protected]>
Date:   Tue Dec 19 10:58:11 2023 -0700

    Fix docs

commit 5b33b98
Author: Deepak Cherian <[email protected]>
Date:   Sun Dec 17 20:35:53 2023 -0700

    fix whats-new

commit 80b2b36
Author: Deepak Cherian <[email protected]>
Date:   Sun Dec 17 20:26:17 2023 -0700

    Reduce more warnings

commit 5f6f4ea
Merge: a57d4ae 2971994
Author: Deepak Cherian <[email protected]>
Date:   Sat Dec 16 20:33:13 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main: (26 commits)
      Filter null values before plotting (pydata#8535)
      Update concat.py (pydata#8538)
      Add getitem to array protocol (pydata#8406)
      Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
      Filter out doctest warning (pydata#8539)
      Bump actions/setup-python from 4 to 5 (pydata#8540)
      Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
      Add Cumulative aggregation (pydata#8512)
      dev whats-new
      Whats-new for 2023.12.0 (pydata#8532)
      explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
      Add `eval` method to Dataset (pydata#7163)
      Deprecate ds.dims returning dict (pydata#8500)
      test and fix empty xindexes repr (pydata#8521)
      Remove PR labeler bot (pydata#8525)
      Hypothesis strategy for generating Variable objects (pydata#8404)
      Use numbagg for `rolling` methods (pydata#8493)
      Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
      fix RTD docs build (pydata#8519)
      Fix type of `.assign_coords` (pydata#8495)
      ...

commit a57d4ae
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:36:04 2023 -0700

    Test one more warning

commit bf8139d
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:33:45 2023 -0700

    Update xarray/tests/test_groupby.py

commit 4e9a063
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:10:14 2023 -0700

    Set squeeze=None for Dataset too

commit c2e576e
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:54:17 2023 -0700

    Fix first, last

commit 6d8e822
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:46:21 2023 -0700

    better warning

commit 62c334b
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:45:17 2023 -0700

    silence warnings

commit b7805a8
Author: dcherian <[email protected]>
Date:   Tue Aug 15 10:54:25 2023 -0600

    Deprecate `squeeze` in GroupBy.

    Closes pydata#2157
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 4, 2024
* upstream/main:
  Faster encoding functions. (pydata#8565)
  ENH: vendor SerializableLock from dask and use as default backend lock, adapt tests (pydata#8571)
  Silence a bunch of CachingFileManager warnings (pydata#8584)
  Bump actions/download-artifact from 3 to 4 (pydata#8556)
  Minimize duplication in `map_blocks` task graph (pydata#8412)
  [pre-commit.ci] pre-commit autoupdate (pydata#8578)
  ignore a `DeprecationWarning` emitted by `seaborn` (pydata#8576)
  Fix mypy type ignore (pydata#8564)
  Support for the new compression arguments. (pydata#7551)
  FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
@bzah bzah mentioned this pull request Jan 5, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Xarray does not support full range of netcdf-python compression options