Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: h5py>=3 string decoding #4893

Merged
merged 8 commits into from
Feb 17, 2021
Merged

FIX: h5py>=3 string decoding #4893

merged 8 commits into from
Feb 17, 2021

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Feb 11, 2021

  • set decode_vlen_strings=True for h5netcdf backend
  • unpin h5py

This is an attempt to align with h5py=>3.0.0 string decoding changes. Now all strings are read as bytes objects for variable-length strings, or numpy bytes arrays ('S' dtypes) for fixed-length strings [1]. From h5netcdf=0.10.0 kwarg decode_vlen_strings is available. This PR makes use of this to keep backwards compatibility with h5py=2 and conformance with netcdf4-python.

[1] https://docs.h5py.org/en/stable/strings.html

…tring to byte string if necessary, unpin h5py
@kmuehlbauer kmuehlbauer changed the title FIX: set decode_strings=True for h5netcdf backend, convert object s… FIX: h5py string decoding Feb 11, 2021
@kmuehlbauer
Copy link
Contributor Author

Nice, I didn't expect to get all green at the first hit.

Nevertheless, please have a thorough look at this. I would also like to know how to document this properly in whats-new.txt. Is this a bug fix or internal change or both?

@dcherian dcherian mentioned this pull request Feb 11, 2021
6 tasks
Comment on lines 196 to 197
if arr.dtype.kind == "O":
arr = arr.astype("S1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. We should probably just fix this newly introduced issue over in h5netcdf first :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer Sure, I'll ping you over there.

@kmuehlbauer kmuehlbauer changed the title FIX: h5py string decoding WIP: FIX: h5py string decoding Feb 11, 2021
@kmuehlbauer
Copy link
Contributor Author

So it was decided to fix this at h5netcdf. I'll update the PR tomorrow.

@kmuehlbauer
Copy link
Contributor Author

All fixes in place, but we need to wait until h5netcdf=0.10.0 is available on conda-forge.

@kmuehlbauer kmuehlbauer changed the title WIP: FIX: h5py string decoding FIX: h5py>=3 string decoding Feb 12, 2021
@kmuehlbauer
Copy link
Contributor Author

This is good to go from my end. Thanks @shoyer for your help to not get lost in this string encoding/decoding maze.

@mathause
Copy link
Collaborator

Could you suppress the warning in the test suite? https://github.com/pydata/xarray/pull/4893/checks?check_run_id=1885301619#step:11:278

So the plan is to read strings as bytes in the future?

@kmuehlbauer
Copy link
Contributor Author

So the plan is to read strings as bytes in the future?

That's what h5py does. h5netcdf just added backwards compatibility using that decode_vlen_strings kwarg. Means downstream users can use decode_vlen_strings=True and have same experience for h5py>=3.0.0 and h5py=2.

…ning tests with `decode_vlen_strings=True`
@kmuehlbauer
Copy link
Contributor Author

Could you suppress the warning in the test suite? https://github.com/pydata/xarray/pull/4893/checks?check_run_id=1885301619#step:11:278

@mathause The FutureWarnings should be silenced by now. I've set decode_vlen_strings=True in all places. So this should also work after deprecated in h5netcdf.

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mathause mathause merged commit a8ed7ed into pydata:master Feb 17, 2021
@mathause
Copy link
Collaborator

Thanks a lot @kmuehlbauer!

dcherian added a commit to dcherian/xarray that referenced this pull request Feb 17, 2021
* upstream/master:
  FIX: h5py>=3 string decoding (pydata#4893)
  Update matplotlib's canonical (pydata#4919)
  Adding vectorized indexing docs (pydata#4711)
  Allow fsspec URLs in open_(mf)dataset (pydata#4823)
  Fix typos in example notebooks (pydata#4908)
  pre-commit autoupdate CI (pydata#4906)
  replace the ci-trigger action with a external one (pydata#4905)
  Update area_weighted_temperature.ipynb (pydata#4903)
  hide the decorator from the test traceback (pydata#4900)
  Sort backends (pydata#4886)
  Compatibility with dask 2021.02.0 (pydata#4884)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix compatibility with h5py version 3 and unpin tests
3 participants