Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-HTTPS remote URLs no longer work as input for open_zarr #4691

Closed
charlesbluca opened this issue Dec 14, 2020 · 5 comments
Closed

Non-HTTPS remote URLs no longer work as input for open_zarr #4691

charlesbluca opened this issue Dec 14, 2020 · 5 comments
Labels
topic-zarr Related to zarr storage library

Comments

@charlesbluca
Copy link

What happened:

On 0.16.2 and later, passing a non-HTTPS remote URL path (e.g. gs://...) as input to open_zarr() results in a KeyError or GroupNotFoundError:

>>> import xarray as xr
>>> xr.open_zarr("gs://cmip6/AerChemMIP/AS-RCEC/TaiESM1/histSST/r1i1p1f1/AERmon/od550aer/gn/", consolidated=True)
KeyError: '.zmetadata'
>>> xr.open_zarr("gs://cmip6/AerChemMIP/AS-RCEC/TaiESM1/histSST/r1i1p1f1/AERmon/od550aer/gn/", consolidated=False)
GroupNotFoundError: group not found at path ''

What you expected to happen:

With versions 0.16.1 and earlier, passing a non-HTTPS remote URL path to open_zarr() as input would successfully open the remote store, provided that a package to handle the specific filesystem was available in the environment and the proper storage options were supplied.

Minimal Complete Verifiable Example:

Same as above, but with decode_times=False to circumvent a cftime dependency:

import xarray as xr

xr.open_zarr(
    "gs://cmip6/AerChemMIP/AS-RCEC/TaiESM1/histSST/r1i1p1f1/AERmon/od550aer/gn/",
    consolidated=True,
    decode_times=False,
)

Anything else we need to know?:

From a brief debug of the code, it looks like this error is a result of open_zarr() now calling open_dataset(engine="zarr") to open the Zarr store.

In this function, the remote URL path is now passed through _normalize_path() where it is not recognized as a remote URL (this check is done by is_remote_uri() which only checks for HTTPS) and is instead interpreted as a relative path in the local filesystem, where it does not exist.

I'm not sure if this meant to be expected behavior, as the documentation on reading datasets in the cloud does not show an example using a URL path as input, and only suggests to use a MutableMapping. However, this is a use case that worked before 0.16.2, and now no longer works.

I think this could be resolved by expanding is_remote_uri() to check for other common remote URIs (e.g. gs:, s3:, etc.).

Environment:

Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.1 | packaged by conda-forge | (default, Dec  9 2020, 01:07:06) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: None
libnetcdf: None

xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.6.1
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 51.0.0.post20201207
pip: 20.3.1
conda: None
pytest: None
IPython: 7.19.0
sphinx: None
@dcherian dcherian added the topic-zarr Related to zarr storage library label Dec 14, 2020
@rabernat
Copy link
Contributor

cc @martindurant for fsspec issue

@martindurant
Copy link
Contributor

I believe #4461 fixes this

Note that you can still use the "old" method of opening the mapper (e.g., fsspec.get_mapper) beforehand and passing that

@charlesbluca
Copy link
Author

Good to know - I figured this issue would be covered in that PR.

In general, I tend to use fsspec.get_mapper() in all cases where I need to access a remote store, but figured this would be good to document as it could potentially result in xarray workflows breaking with the version bump - for example, this PR in cmip6_preprocessing was the reason I opened this issue, as it showcases a Zarr testing suite that worked on 0.16.1 and stopped working on 0.16.2.

@martindurant
Copy link
Contributor

#4823 working on this. Please try and comment.

@charlesbluca
Copy link
Author

With @martindurant's PR merged, this should be fixed now - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

4 participants