-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: h5py>=3 string decoding #4893
Conversation
…tring to byte string if necessary, unpin h5py
decode_strings=True
for h5netcdf backend, convert object s…
Nice, I didn't expect to get all green at the first hit. Nevertheless, please have a thorough look at this. I would also like to know how to document this properly in |
xarray/coding/strings.py
Outdated
if arr.dtype.kind == "O": | ||
arr = arr.astype("S1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. We should probably just fix this newly introduced issue over in h5netcdf first :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer Sure, I'll ping you over there.
So it was decided to fix this at h5netcdf. I'll update the PR tomorrow. |
All fixes in place, but we need to wait until |
This is good to go from my end. Thanks @shoyer for your help to not get lost in this string encoding/decoding maze. |
Could you suppress the warning in the test suite? https://github.com/pydata/xarray/pull/4893/checks?check_run_id=1885301619#step:11:278 So the plan is to read strings as |
That's what |
…ning tests with `decode_vlen_strings=True`
@mathause The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Thanks a lot @kmuehlbauer! |
* upstream/master: FIX: h5py>=3 string decoding (pydata#4893) Update matplotlib's canonical (pydata#4919) Adding vectorized indexing docs (pydata#4711) Allow fsspec URLs in open_(mf)dataset (pydata#4823) Fix typos in example notebooks (pydata#4908) pre-commit autoupdate CI (pydata#4906) replace the ci-trigger action with a external one (pydata#4905) Update area_weighted_temperature.ipynb (pydata#4903) hide the decorator from the test traceback (pydata#4900) Sort backends (pydata#4886) Compatibility with dask 2021.02.0 (pydata#4884)
decode_vlen_strings=True
for h5netcdf backendThis is an attempt to align with
h5py=>3.0.0
string decoding changes. Now all strings are read asbytes
objects for variable-length strings, or numpy bytes arrays ('S' dtypes) for fixed-length strings [1]. Fromh5netcdf=0.10.0
kwargdecode_vlen_strings
is available. This PR makes use of this to keep backwards compatibility withh5py=2
and conformance withnetcdf4-python
.[1] https://docs.h5py.org/en/stable/strings.html
pre-commit run --all-files
whats-new.rst
api.rst