Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5-DIAG warnings from libnetcdf v4.9.1 #5187

Closed
trexfeathers opened this issue Mar 8, 2023 · 10 comments · Fixed by #5274
Closed

HDF5-DIAG warnings from libnetcdf v4.9.1 #5187

trexfeathers opened this issue Mar 8, 2023 · 10 comments · Fixed by #5274

Comments

@trexfeathers
Copy link
Contributor

📰 Custom Issue

Since #5177, NetCDF loading is prompting verbose HDF5-DIAG warnings in some/all cases; here is an example.

@pp-mo and I have done some testing and isolated this to the advancement of libnetcdf from v4.8.1 to v4.9.1.

As far as we can tell this is probably innocuous, a view tentatively shared in pydata/xarray#7549. However it is going to start alarming developers and users alike, and will be a source of frustration given how verbose Iris' own warnings are as well. We've already had a case where it gets in the way of other print statements and thus makes ordinary work that much harder.

So far not sure what the next step is, but wanted to post this to forewarn everyone else.

@pp-mo
Copy link
Member

pp-mo commented Mar 9, 2023

advancement of libnetcdf from v4.8.1 to v4.9.1

Again tentatively, from pydata/xarray#7549 it looks like the actual problem may lie in hdf5 changes introduced with hdf5 v1.12.2
That version of hdf5 has already been out for months, but we suspect the libnetcdf changes may have newly "exposed" the problem.

@pp-mo
Copy link
Member

pp-mo commented Mar 10, 2023

I had a lot of these errors in #5191, as you might expect.
See : these test runs, raw log here

To be fair, I don't think that caused any of the test failures there, since the few actual failures remained the same when I pinned libnetcdf
-- and pinning to libnetcdf=4.8 removed all the offending HDF5 warnings, as expected

pp-mo added a commit to pp-mo/iris that referenced this issue Mar 10, 2023
@pp-mo
Copy link
Member

pp-mo commented Mar 10, 2023

I don't think that caused any of the test failures there ... and pinning to libnetcdf=4.8 removed all the offending HDF5 warnings

However, I do now seem to have python 3.10 specific failures.
Which I think may indicate that pinning libnetcdf is not a panacea as it may introduce actual errors 😬

Update 2023-03-14: it is in fact not confined to Python 3.10. It seems now to work, with a fix, akin to that in #5095

@pp-mo
Copy link
Member

pp-mo commented Mar 27, 2023

Ping !

Latest main-branch doctests are really full of the HDF5-DIAG warning messages
E.G. : https://pipelines.actions.githubusercontent.com/serviceHosts/85610251-844a-4a1d-8171-bd20e30f9c14/_apis/pipelines/1/runs/4455/signedlogcontent/6?urlExpires=2023-03-27T00%3A35%3A51.0247240Z&urlSigningMethod=HMACV1&urlSignature=9tcPZXK%2B%2Bov5smdX5IBtQ4z5s41u5Y%2BANwxBJcsfJOc%3D

Could this also be the basic reason we are getting benchmark regressions ?
We really need to seek a solution to this.

@trexfeathers
Copy link
Contributor Author

Could this also be the basic reason we are getting benchmark regressions ?

It would explain the regressions in #5182, but not the others, since libnetcdf has not advanced since then.

We really need to seek a solution to this.

Since the core devs are assigned to other work for the next two weeks, we would need to explicitly drop something to make room for this investigation. We can discuss that this morning if you think it is sensible.

@trexfeathers
Copy link
Contributor Author

Poll:

@trexfeathers
Copy link
Contributor Author

Relevant: pydata/xarray#7388 (comment)

@zklaus
Copy link

zklaus commented Apr 21, 2023

I resolved this in the conda-forge package for libnetcdf 4.9.2 in conda-forge/libnetcdf-feedstock#175. Would it be useful to have it for 4.9.1 as well? The backport would be easy enough, but it's always a little bit of a hassle to release older versions. Would pinning to !=4.9.1 be an option?

@trexfeathers
Copy link
Contributor Author

I resolved this in the conda-forge package for libnetcdf 4.9.2 in conda-forge/libnetcdf-feedstock#175. Would it be useful to have it for 4.9.1 as well? The backport would be easy enough, but it's always a little bit of a hassle to release older versions. Would pinning to !=4.9.1 be an option?

Thanks @zklaus!

That's definitely a sensible option.

We will take a look next week. Getting a lot of conflicts with different PR's updating dependencies today.

lbdreyer pushed a commit that referenced this issue Apr 21, 2023
* Basic functional lazy saving.

* Simplify function signature which upsets Sphinx.

* Non-lazy saves return nothing.

* Now fixed to enable use with process/distributed scheduling.

* Remove dask.utils.SerializableLock, which I think was a mistake.

* Make DefferedSaveWrapper use _thread_safe_nc.

* Fixes for non-lazy save.

* Avoid saver error when no deferred writes.

* Reorganise locking code, ready for shareable locks.

* Remove optional usage of 'filelock' for lazy saves.

* Document dask-specific locking; implement differently for threads or distributed schedulers.

* Minor fix for unit-tests.

* Pin libnetcdf to avoid problems -- see #5187.

* Minor test fix.

* Move DeferredSaveWrapper into _thread_safe_nc; replicate the NetCDFDataProxy fix; use one lock per Saver; add extra up-scaled test

* Update lib/iris/fileformats/netcdf/saver.py

Co-authored-by: Bouwe Andela <[email protected]>

* Update lib/iris/fileformats/netcdf/_dask_locks.py

Co-authored-by: Bouwe Andela <[email protected]>

* Update lib/iris/fileformats/netcdf/saver.py

Co-authored-by: Bouwe Andela <[email protected]>

* Small rename + reformat.

* Remove Saver lazy option; all lazy saves are delayed; factor out fillvalue checks and make them delayable.

* Repurposed 'test__FillValueMaskCheckAndStoreTarget' to 'test__data_fillvalue_check', since old class is gone.

* Disable (temporary) saver debug printouts.

* Fix test problems; Saver automatically completes to preserve existing direct usage (which is public API).

* Fix docstring error.

* Fix spurious error in old saver test.

* Fix Saver docstring.

* More robust exit for NetCDFWriteProxy operation.

* Fix doctests by making the Saver example functional.

* Improve docstrings; unify terminology; simplify non-lazy save call.

* Moved netcdf cell-method handling into nc_load_rules.helpers, and various tests into more specific test folders.

* Fix lockfiles and Makefile process.

* Add unit tests for routine _fillvalue_report().

* Remove debug-only code.

* Added tests for what the save function does with the 'compute' keyword.

* Fix mock-specific problems, small tidy.

* Restructure hierarchy of tests.unit.fileformats.netcdf

* Tidy test docstrings.

* Correct test import.

* Avoid incorrect checking of byte data, and a numpy deprecation warning.

* Alter parameter names to make test reports clearer.

* Test basic behaviour of _lazy_stream_data; make 'Saver._delayed_writes' private.

* Add integration tests, and distributed dependency.

* Docstring fixes.

* Documentation section and whatsnew entry.

* Various fixes to whatsnew, docstrings and docs.

* Minor review changes, fix doctest.

* Arrange tests + results to organise by package-name alone.

* Review changes.

* Review changes.

* Enhance tests + debug.

* Support scheduler type 'single-threaded'; allow retries on delayed-save test.

* Improve test.

* Adding a whatsnew entry for 5224 (#5234)

* Adding a whatsnew entry explaining 5224

* Fixing link and format error

* Replacing numpy legacy printing with array2string and remaking results for dependent tests

* adding a whatsnew entry

* configure codecov

* remove results creation commit from blame

* fixing whatsnew entry

* Bump scitools/workflows from 2023.04.1 to 2023.04.2 (#5236)

Bumps [scitools/workflows](https://github.com/scitools/workflows) from 2023.04.1 to 2023.04.2.
- [Release notes](https://github.com/scitools/workflows/releases)
- [Commits](SciTools/workflows@2023.04.1...2023.04.2)

---
updated-dependencies:
- dependency-name: scitools/workflows
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Use real array for data of of small netCDF variables. (#5229)

* Small netCDF variable data is real.

* Various test fixes.

* More test fixing.

* Fix printout in Mesh documentation.

* Whatsnew + doctests fix.

* Tweak whatsnew.

* Handle derived coordinates correctly in `concatenate` (#5096)

* First working prototype of concatenate that handels derived coordinates correctly

* Added checks for derived coord metadata during concatenation

* Added tests

* Fixed defaults

* Added what's new entry

* Optimized test coverage

* clarity on whatsnew entry contributors (#5240)

* Modernize and simplify iris.analysis._Groupby (#5015)

* Modernize and simplify _Groupby

* Rename variable to improve readability

Co-authored-by: Martin Yeo <[email protected]>

* Add a whatsnew entry

* Add a type hint to _add_shared_coord

* Add a test for iris.analysis._Groupby.__repr__

---------

Co-authored-by: Martin Yeo <[email protected]>

* Finalises Lazy Data documentation (#5137)

* cube and io lazy data notes added

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added comments within analysis, as well as palette and iterate, and what's new

* fixed docstrings as requested in @trexfeathers review

* reverted cube.py for time being

* fixed flake8 issue

* Lazy data second batch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated lastest what'snew

* I almost hope this wasn't the fix, I'm such a moron

* adressed review changes

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bill Little <[email protected]>

* Fixes to _discontiguity_in_bounds (attempt 2) (#4975)

* update ci locks location (#5228)

* Updated environment lockfiles (#5211)

Co-authored-by: Lockfile bot <[email protected]>

* Increase retries.

* Change debug to show which elements failed.

* update cf standard units (#5244)

* update cf standard units

* added whatsnew entry

* Correct pull number

Co-authored-by: Martin Yeo <[email protected]>

---------

Co-authored-by: Martin Yeo <[email protected]>

* libnetcdf <4.9 pin (#5242)

* Pin libnetcdf<4.9 and update lock files.

* What's New entry.

* libnetcdf not available on PyPI.

* Fix for Pandas v2.0.

* Fix for Pandas v2.0.

* Avoid possible same-file crossover between tests.

* Ensure all-different testfiles; load all vars lazy.

* Revert changes to testing framework.

* Remove repeated line from requirements/py*.yml (?merge error), and re-fix lockfiles.

* Revert some more debug changes.

* Reorganise test for better code clarity.

* Use public 'Dataset.isopen()' instead of '._isopen'.

* Create output files in unique temporary directories.

* Tests for fileformats.netcdf._dask_locks.

* Fix attribution names.

* Fixed new py311 lockfile.

* Fix typos spotted by codespell.

* Add distributed test dep for python 3.11

* Fix lockfile for python 3.11

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Bouwe Andela <[email protected]>
Co-authored-by: Henry Wright <[email protected]>
Co-authored-by: Henry Wright <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Manuel Schlund <[email protected]>
Co-authored-by: Bill Little <[email protected]>
Co-authored-by: Bouwe Andela <[email protected]>
Co-authored-by: Martin Yeo <[email protected]>
Co-authored-by: Elias <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: stephenworsley <[email protected]>
Co-authored-by: scitools-ci[bot] <107775138+scitools-ci[bot]@users.noreply.github.com>
Co-authored-by: Lockfile bot <[email protected]>
@trexfeathers
Copy link
Contributor Author

I've tested this locally and it's looking good! In a meeting now so will push up in an hour or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants