-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
netCDF4: support byte strings as attribute values #7186
Comments
The reason for this behavior is that the Line 175 in 6cb97f6
In the long term, I would suggest to add bytes as a supported type in that list above on xarray's side.
A quick workaround for you might be to encode the string as ds["x"].attrs["third_str"] = np.array("hää".encode("utf-8")) |
I plan to implement this in the next couple days. |
That would be much appreciated. Thanks! |
A comment about the proposed solution. Allowing import numpy as np
import xarray as xr
all_bytes = bytes(range(256))
good_bytes = "bar°".encode("UTF-8")
bad_byte = b'\x00'
not_bytes = "hää"
data = np.ones([1])
ds = xr.Dataset({"data": (["x"], data)}, coords={"x": np.arange(1)})
# ds["x"].attrs["first_str"] = all_bytes
ds["x"].attrs["second_str"] = bad_byte
ds["x"].attrs["third_str"] = good_bytes
ds["x"].attrs["fourth_str"] = not_bytes
# ds.to_netcdf("testds.nc", engine = "netcdf4")
# ds.to_netcdf("testds.nc", engine = "scipy")
ds.to_netcdf("testds.nc", engine = "h5netcdf")
!ncdump -h testds.nc I propose adding a check to def check_attr(name, value, valid_types):
...
if isinstance(value, bytes):
try:
value.decode('utf-8')
except UnicodeDecodeError as e:
raise ValueError(
f"Invalid value provided for attribute '{name!r}': {value!r}. "
"Only binary data derived from UTF-8 encoded strings is allowed."
) from e
if b'\x00' in value:
raise ValueError(
f"Invalid value provided for attribute '{name!r}': {value!r}. "
"Null characters are not permitted."
)
|
If it's only unsupported by h5netcdf, it would be good to specialize the check to just that engine. |
What is your issue?
When I have a string attribute with special characters like '°' or German Umlauts (Ä, Ü, etc) it will get written to file as type NC_STRING. Other string attributes not containing any special characters will be saved as NC_CHAR.
This leads to problems when I subsequently want to open this file with NetCDF-Fortran, because it does not fully support NC_STRING.
So my question is:
Is there a way to force xarray to write the string attribute as NC_CHAR?
Example
The output of

ncdump -h
looks like this, which shows the different data type of the second and third attribute:The text was updated successfully, but these errors were encountered: