Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRF output : cannot serialize variable #1809

Closed
gbromley opened this issue Jan 4, 2018 · 11 comments · Fixed by #8195
Closed

WRF output : cannot serialize variable #1809

gbromley opened this issue Jan 4, 2018 · 11 comments · Fixed by #8195

Comments

@gbromley
Copy link

gbromley commented Jan 4, 2018

Code Sample, a copy-pastable example if possible

md.to_netcdf('modified_wrf_input.nc')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-ccba003701f9> in <module>()
----> 1 md.to_netcdf('modified_wrf_input.nc')

/Users/gbromley/anaconda/lib/python3.5/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims)
    981         return to_netcdf(self, path, mode, format=format, group=group,
    982                          engine=engine, encoding=encoding,
--> 983                          unlimited_dims=unlimited_dims)
    984 
    985     def __unicode__(self):

/Users/gbromley/anaconda/lib/python3.5/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, writer, encoding, unlimited_dims)
    581     try:
    582         dataset.dump_to_store(store, sync=sync, encoding=encoding,
--> 583                               unlimited_dims=unlimited_dims)
    584         if path_or_file is None:
    585             return target.getvalue()

/Users/gbromley/anaconda/lib/python3.5/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims)
    907         if encoding is None:
    908             encoding = {}
--> 909         variables, attrs = conventions.encode_dataset_coordinates(self)
    910 
    911         check_encoding = set()

/Users/gbromley/anaconda/lib/python3.5/site-packages/xarray/conventions.py in encode_dataset_coordinates(dataset)
   1052     non_dim_coord_names = set(dataset.coords) - set(dataset.dims)
   1053     return _encode_coordinates(dataset._variables, dataset.attrs,
-> 1054                                non_dim_coord_names=non_dim_coord_names)
   1055 
   1056 

/Users/gbromley/anaconda/lib/python3.5/site-packages/xarray/conventions.py in _encode_coordinates(variables, attributes, non_dim_coord_names)
   1015             raise ValueError('cannot serialize coordinates because variable '
   1016                              "%s already has an attribute 'coordinates'"
-> 1017                              % var_name)
   1018         attrs['coordinates'] = ' '.join(map(str, coord_names))
   1019 

ValueError: cannot serialize coordinates because variable FSA already has an attribute 'coordinates'

Problem description

This dataset is the wrfinput_d01 file for use with the WRF model. Reading and modifying the variables worked, but I cannot figure out how to write the changes out to the file. I get the above error. I saw another post about the same issue, but wasn't sure how it was resolved.

@raybellwaves
Copy link
Contributor

Would you mind printing md first to show what it looks like?

@fmaussion
Copy link
Member

It looks like your variable has an attribute coordinates: try deleting it before writing the data.

@charlie-becker
Copy link

Did you ever find a solution to this?

@shoyer
Copy link
Member

shoyer commented May 22, 2018

See https://stackoverflow.com/questions/50475453/xarray-cannot-serialize-coordinates/50475925.

It seems like WRF may often produce netCDF files that lead to this issue. If the problem can be reproduced by simply opening and resaving a netCDF file then we may want to revisit our logic in xarray, because we always want xarray.open_dataset(input_path).to_netcdf(output_path) to work. If this is the case, it would be great if someone could share metadata for such a problematic netCDF file (e.g., from ncdump -h).

@charlie-becker
Copy link

Thanks for the solution (that was my SO post as well). Deleting the attrs.['coordinates'] was a clean workaround. Here's the requested WRF meta. It was concatenated with NCO, but meta should be consistent.

WRF_meta.txt

@gbromley
Copy link
Author

Hello, I realized I never followed up with this but am again running into issues. The real problem for me is that deleting coordinates on 30+ variables isn't really feasible. Is it possible to delete the coordinates attribute for all variables?

@shoyer
Copy link
Member

shoyer commented Dec 10, 2018

@gbromley The best option is probably a loop, e.g.,

def remove_problematic_attrs(ds):
    for variable in ds.variables.values():
        if 'coordinates' in variable.attrs:
            del variable.attrs['coordinates']

@gbromley
Copy link
Author

That's perfect. Thank you!

@milancurcic
Copy link

milancurcic commented Apr 21, 2019

I ran into this issue trying to roundtrip a WRF output file. It looks like xarray raises an error for any NetCDF file that has variables with a coordinates attribute:

    # These coordinates are saved according to CF conventions
    for var_name, coord_names in variable_coordinates.items():
        attrs = variables[var_name].attrs
        if 'coordinates' in attrs:
            raise ValueError('cannot serialize coordinates because variable '
                             "%s already has an attribute 'coordinates'"
                             % var_name)
        attrs['coordinates'] = ' '.join(map(str, coord_names))

Both this choice, and the proposed solution in this issue (delete all coordinates attributes), I don't understand. Variables with a coordinates attribute are CF conforming, so xarray should be able to play along with this.

The solution that makes more sense to me is to raise a warning and overwrite or ignore the coordinates attribute, if the attribute is already present. Later step of the fix could even be a keyword argument to allow the user to choose whether to overwrite or ignore "conflicting" attributes.

Or perhaps I'm missing something obvious here... Let me know either way. I'd be happy to make a PR to patch this.

@milancurcic
Copy link

milancurcic commented Apr 24, 2019

I can't seem recreate this with a minimal example, xarray roundtrips a NetCDF file with a coordinates attribute correctly:

from netCDF4 import Dataset
import xarray as xr

with Dataset('test.nc', format='NETCDF4', mode='w') as nc:
    nc.createDimension('dim1', size=0)
    var = nc.createVariable('var1', 'f8', dimensions=('dim1'))
    var[:] = [1., 2., 3.]
    var.setncattr('coordinates', 'dim1')

xr.open_dataset('test.nc').to_netcdf('test2.nc')

There is something peculiar about how WRF handles the coordinates attribute, but I can't see anything off about it yet.

Interestingly, I can workaround the WRF coordinates issue by setting decode_coords=False in xarray.open_dataset(), for example, this works:

xr.open_dataset('wrfout_d01_2019-04-16_15_00_00', decode_coords=False).to_netcdf('test.nc')

while this doesn't:

xr.open_dataset('wrfout_d01_2019-04-16_15_00_00').to_netcdf('test.nc') 

@dcherian dcherian changed the title cannot serialize variable WRF output : cannot serialize variable Apr 9, 2022
@kmuehlbauer
Copy link
Contributor

Hi, not sure if WRF has fixed this inconsistency since back then. But I'm trying to add some more insight to this issue, to probably fix it in some way:

Here the coordinates decoding is done:

if "coordinates" in var_attrs:
coord_str = var_attrs["coordinates"]
var_coord_names = coord_str.split()
if all(k in variables for k in var_coord_names):
new_vars[k].encoding["coordinates"] = coord_str
del var_attrs["coordinates"]
coord_names.update(var_coord_names)

The major issue here is, that the coordinates are only correctly propagated to .encoding if all items of the coordinates-string are variables in the dataset. As XTIME is no variable in this dataset (see dump above) this decoding step is silently passed also for XLONG and XLAT, keeping the original coordinates attribute.

We could at least warn here and remove XTIME from the decoded coordinates to read the dataset with XLONG/XLAT as coordinates. We could also raise a meaningful error. For this to work decode_coords could be enhanced by "strict".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants