-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's list all the netCDF files that xarray can't open #2368
Comments
I believe the last one in the notebook below is already fixed and the first two are mentioned above but here is a data point: http://nbviewer.jupyter.org/gist/ocefpaf/1bf3b86359c459c89d44a81d3129f967 |
import xarray as xr
xr.open_dataset('http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/TwoD') ---------------------------------------------------------------------------
MissingDimensionsError Traceback (most recent call last)
<ipython-input-6-e2a87d803d99> in <module>()
----> 1 xr.open_dataset(gfs_cat.datasets[0].access_urls['OPENDAP'])
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs)
344 lock = _default_lock(filename_or_obj, engine)
345 with close_on_error(store):
--> 346 return maybe_decode_store(store, lock)
347 else:
348 if engine is not None and engine != 'scipy':
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock)
256 store, mask_and_scale=mask_and_scale, decode_times=decode_times,
257 concat_characters=concat_characters, decode_coords=decode_coords,
--> 258 drop_variables=drop_variables)
259
260 _protect_dataset_variables_inplace(ds, cache)
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
428 vars, attrs, concat_characters, mask_and_scale, decode_times,
429 decode_coords, drop_variables=drop_variables)
--> 430 ds = Dataset(vars, attrs=attrs)
431 ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))
432 ds._file_obj = file_obj
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs, compat)
363 coords = {}
364 if data_vars is not None or coords is not None:
--> 365 self._set_init_vars_and_dims(data_vars, coords, compat)
366 if attrs is not None:
367 self.attrs = attrs
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/dataset.py in _set_init_vars_and_dims(self, data_vars, coords, compat)
381
382 variables, coord_names, dims = merge_data_and_coords(
--> 383 data_vars, coords, compat=compat)
384
385 self._variables = variables
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
363 indexes = dict(extract_indexes(coords))
364 return merge_core(objs, compat, join, explicit_coords=explicit_coords,
--> 365 indexes=indexes)
366
367
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
433 coerced = coerce_pandas_values(objs)
434 aligned = deep_align(coerced, join=join, copy=False, indexes=indexes)
--> 435 expanded = expand_variable_dicts(aligned)
436
437 coord_names, noncoord_names = determine_coords(coerced)
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/merge.py in expand_variable_dicts(list_of_variable_dicts)
209 var_dicts.append(coords)
210
--> 211 var = as_variable(var, name=name)
212 sanitized_vars[name] = var
213
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/variable.py in as_variable(obj, name)
112 'dimensions %r. xarray disallows such variables because they '
113 'conflict with the coordinates used to label '
--> 114 'dimensions.' % (name, obj.dims))
115 obj = obj.to_index_variable()
116
MissingDimensionsError: 'time' has more than 1-dimension and the same name as one of its dimensions ('reftime', 'time'). xarray disallows such variables because they conflict with the coordinates used to label dimensions. |
Here's a sample CDL for a file:
which gives: ---------------------------------------------------------------------------
MissingDimensionsError Traceback (most recent call last)
<ipython-input-10-d6f8d8651b9f> in <module>()
4 query.add_lonlat().accept('netcdf4')
5 nc = ncss.get_data(query)
----> 6 xr.open_dataset(NetCDF4DataStore(nc))
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs)
352 store = backends.ScipyDataStore(filename_or_obj)
353
--> 354 return maybe_decode_store(store)
355
356
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock)
256 store, mask_and_scale=mask_and_scale, decode_times=decode_times,
257 concat_characters=concat_characters, decode_coords=decode_coords,
--> 258 drop_variables=drop_variables)
259
260 _protect_dataset_variables_inplace(ds, cache)
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
428 vars, attrs, concat_characters, mask_and_scale, decode_times,
429 decode_coords, drop_variables=drop_variables)
--> 430 ds = Dataset(vars, attrs=attrs)
431 ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))
432 ds._file_obj = file_obj
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs, compat)
363 coords = {}
364 if data_vars is not None or coords is not None:
--> 365 self._set_init_vars_and_dims(data_vars, coords, compat)
366 if attrs is not None:
367 self.attrs = attrs
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/dataset.py in _set_init_vars_and_dims(self, data_vars, coords, compat)
381
382 variables, coord_names, dims = merge_data_and_coords(
--> 383 data_vars, coords, compat=compat)
384
385 self._variables = variables
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
363 indexes = dict(extract_indexes(coords))
364 return merge_core(objs, compat, join, explicit_coords=explicit_coords,
--> 365 indexes=indexes)
366
367
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
433 coerced = coerce_pandas_values(objs)
434 aligned = deep_align(coerced, join=join, copy=False, indexes=indexes)
--> 435 expanded = expand_variable_dicts(aligned)
436
437 coord_names, noncoord_names = determine_coords(coerced)
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/merge.py in expand_variable_dicts(list_of_variable_dicts)
209 var_dicts.append(coords)
210
--> 211 var = as_variable(var, name=name)
212 sanitized_vars[name] = var
213
~/miniconda3/envs/py36/lib/python3.6/site-packages/xarray/core/variable.py in as_variable(obj, name)
112 'dimensions %r. xarray disallows such variables because they '
113 'conflict with the coordinates used to label '
--> 114 'dimensions.' % (name, obj.dims))
115 obj = obj.to_index_variable()
116
MissingDimensionsError: 'isobaric' has more than 1-dimension and the same name as one of its dimensions ('station', 'profile', 'isobaric'). xarray disallows such variables because they conflict with the coordinates used to label dimensions. |
The two examples by @dopplershift are the same problem as in #2233 |
This is mentioned elsewhere (can't find the issue right now) and may be out of scope for this issue but I'm going to say it anyway: opening a NetCDF file with groups was not as easy as I wanted it to be when first starting out with xarray. |
I found this problem too long ago (see #457). Back then the workaround we implemented is to exclude the offending variable ("siglay" or "isobaric" in the examples above) with the "drop_variables" optional argument. Of course this is not great if you want to actually use the values in the variable you are dropping. I personally don't like the notion of a "two dimensional coordinate", I find it confusing. However this kind of netCDFs are common, so fully supporting them in xarray would be nice. But I don't know how. Maybe just renaming the variable instead of dropping it with a "rename_variables"? This is the only thing that comes to my mind. |
@djhoese - it would be great if you could track down a more specific example of the issue you are referring to. Excluding this possible problem with groups, my assessment of the feedback above is that, actually, the only problem is #2233: we can't have multidimensional variables that are also their own dimensions. This is a good thing. It means we have a specific problem to fix. Right now this is ok:
But this is not
Personally I find this to be an incredibly confusing, recursive use of the concept of "dimensions". For me, dimensions should be orthogonal. In the second example, So the question is: what can we do about it? I propose the following general outline:
Finally, we might want to raise this upstream with netCDF or CF conventions to try to understand better why this sort of schema is being encouraged. |
@rabernat For the groups NetCDF files I had in mind the NASA L1B data files for the satellite instrument VIIRS onboard Suomi-NPP and NOAA-20 satellites. You can see an example file here. The summary of the ncdump is:
When I first started out with xarray I assumed I would be able to do something like:
Which I can't do, but can do with the python netcdf4 library:
I understand that I can provide the |
@rabernat While I agree that they're (somewhat) confusing files, I think you're missing two things:
IMO, xarray is being overly pedantic here. XArray states that it adopts the Common Data Model (CDM); netCDF-java and the CDM were the tools used to generate the failing examples above. |
@dopplershift - thanks for the clarifications! I agree that it's good for netCDF to be as open-ended as possible. So I guess my quarrel is with the CDM. This is what it says about variables and dimensions:
then later
I have a very hard time understanding what all of this means. Can the same variable be a "Dimension" and a "CoordinateAxis" in CDM? It seems much simpler to me to use the CF approach to describe the physical coordinates of the data using "auxiliary coordinate variables" and to keep the dimensions as purely 1D "coordinate variables".
What would you like xarray to do with these datasets, given the fact that orthogonality of dimensions is central to its data model? |
Perhaps part of the confusion is simply that |
Currently, xarray requires that variables with a name matching a dimension are 1D variables along that dimension, e.g., for dim in dataset.dims:
if dim in dataset.variables:
assert dataset.variables[dim].dims == (dim,) I agree that this unnecessarily complicates our data model. There's no particular advantage to this invariant, besides removing the need to check the dimensions of variables used for indexing lookups. I'm sure there are some cases internally where we currently rely on this assumption, but it should be relatively easy to relax. |
It seems like this relaxation is compatible with the refactoring of indexes. Right now, we automatically create 1D indexes for all coordinate variables. The problem with 2D dimensions is that such indexes don't make sense:
But maybe we could turn multi-dimensional coordinate variables into multi-indexes? Or no index at all? In any case, we could still do
i.e. logical indexing on the dimension axis. |
This would be my inclination (for the default behavior). It would mean that you could not longer count on anyways being able to do labeled indexing along each dimension, but in the broader scheme of things I don't think that's a big deal. |
That sounds reasonable to me. I don't necessarily expect all of the xarray goodness to work with those files, but I do expect them to open without error. |
Just adding that netCDF files produced as output from the GOTM turbulence model cannot be opened by xarray. I believe the reason is self-referential multidimensional coordinates. |
@nordam , can you provide an example? |
Indeed. An example file (1.1 MB) can be found here: http://folk.ntnu.no/nordam/entrainment.nc And the error message I get on trying to open this file is:
|
We are working on fixing this in #2405. That PR (mine) has most of the basic functionality there, but it still needs more testing. Unfortunately, I don't have bandwidth right now to complete the required work. If anyone here needs this fixed urgently and actually has time to work on it, I encourage you to pick up that PR and try to finish it off. We will be happy to provide help and support along the way. |
Adding another example. While working through the Model Evaluation Tool (MET) tutorial, I created a NetCDF file with the tool, and wasn't able to open the file it created.
Sounds to me like the same error caused by #2233 Below is the .nc file contents with
|
Found one! https://www.ncei.noaa.gov/data/oceans/ncei/ocads/data/0191304/ The dataset published in Bushinsky et al. (2019), which is basically the Landshutzer et al. (2014) climatology plus SOCCOM Float-based pCO2 data, and updated through 2018. I've only tried the first file in the list (https://www.ncei.noaa.gov/data/oceans/ncei/ocads/data/0191304/MPI-SOM_FFN_SOCCOMv2018.nc), but suspect the others will have the same issue. Here's the error (sounds like you all have discussed before, but I can't see an easy answer):
|
Clearly we can detect this failure, so shall we rename the |
@dcherian Thanks for your reply. I think I understand the issue. What, specifically, do you suggest to fix this issue in my own code considering this is not a dataset I generated? |
for dim in dataset.dims:
if dim in dataset.variables:
assert dataset.variables[dim].dims == (dim,)
@benbovy will the explicit indexes refactor fix this case?
@djhoese For anything to do with opening netCDF files with groups see #4118 and the linked issues from there. If people have example of other weird cases involving groups (like groups within themselves or anything like that) then I would be interested to have those files to test with! |
@TomNicholas yes with the explicit index refactor we should be able to relax the 1D coordinate / dimension matching name constraint in the Xarray data model.
I also initially thought it would be easy to relax, but I'm not so sure anymore. I don't think it is a hard task, but it might still require some fair amount of work. I've already refactored a bunch of such internal cases in #5692, but there's a good chance that some (not sure how many) cases will still need a fix. |
Found another example from ICON NWP model. Files open with netCDF4 library but not with xarray.
Error:
|
@maxaragon, i'm curious. what version of xarray/netcdf4 are you using? i'm asking because this appears to be working fine on my end In [1]: import xarray as xr
In [2]: ds = xr.open_dataset("20200825_hyytiala_icon-iglo-12-23.nc")
In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions: (time: 25, level: 90, flux_level: 91,
frequency: 2, soil_level: 9)
Coordinates:
* time (time) datetime64[ns] 2020-08-25 ... 2020-0...
* level (level) float32 90.0 89.0 88.0 ... 3.0 2.0 1.0
* flux_level (flux_level) float32 91.0 90.0 ... 2.0 1.0
* frequency (frequency) float32 34.96 94.0
Dimensions without coordinates: soil_level
Data variables: (12/62)
latitude float32 ...
longitude float32 ...
altitude float32 ...
horizontal_resolution float32 ...
forecast_time (time) timedelta64[ns] ...
height (time, level) float32 ...
... ...
gas_atten (frequency, time, level) float32 ...
specific_gas_atten (frequency, time, level) float32 ...
specific_saturated_gas_atten (frequency, time, level) float32 ...
specific_dry_gas_atten (frequency, time, level) float32 ...
K2 (frequency, time, level) float32 ...
specific_liquid_atten (frequency, time, level) float32 ...
Attributes: (12/13)
institution: Max Planck Institute for Meteorology/Deutscher Wette...
references: see MPIM/DWD publications
source: svn://xceh.dwd.de/for0adm/SVN_icon/tags/icon-2.6.0-n...
Conventions: CF-1.7
location: hyytiala
file_uuid: ace15f8ba477497c8d1dd0833b5ac674
... ...
year: 2020
month: 08
day: 25
history: 2021-01-25 08:24:29 - File content harmonized by the...
title: Model file from Hyytiala
pid: https://hdl.handle.net/21.12132/1.ace15f8ba477497c here are the versions i'm using In [4]: xr.show_versions()
/Users/andersy005/mambaforge/envs/playground/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 22.1.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.10.0
pandas: 1.5.1
numpy: 1.23.4
scipy: 1.9.3
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.10.2
distributed: 2022.10.2
matplotlib: 3.6.1
cartopy: None
seaborn: 0.12.0
numbagg: None
fsspec: 2022.10.0
cupy: None
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.3
conda: None
pytest: None
IPython: 8.6.0
sphinx: None |
@andersy005 indeed, I have updated xarray and works now, previous version was:
|
found this one, The dataset was given based on request. That's why...
Output.
|
@ronygolderku thanks for your example. Looks like it fails for the same reason as was mentioned for some of the other examples above. |
Is there any solution? |
Closing since the most common case in this issue was fixed by #7989 and #8126. The request for supporting groups is handled by https://github.com/xarray-contrib/datatree |
At the Pangeo developers meetings, I am hearing lots of reports from folks like @dopplershift and @rsignell-usgs about netCDF datasets that xarray can't open.
My expectation is that xarray doesn't have strong requirements on the contents of datasets. (It doesn't "enforce" cf compatibility for example; that's optional.) Anything that can be written to netCDF should be readable by xarray.
I would like to collect examples of places where xarray fails. So far, I am only aware of one:
siglay(siglay, node)
. Onlysiglay(siglay)
would work.Are there other distinct cases?
Please provide links / sample code of netCDF datasets that xarray can't read. Even better would be short code snippets to create such datasets in python using the netcdf4 interface.
The text was updated successfully, but these errors were encountered: