Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds = xr.tutorial.load_dataset("air_temperature") with 0.18 needs engine argument #5291

Closed
keewis opened this issue May 11, 2021 · 15 comments · Fixed by #5300
Closed

ds = xr.tutorial.load_dataset("air_temperature") with 0.18 needs engine argument #5291

keewis opened this issue May 11, 2021 · 15 comments · Fixed by #5300

Comments

@keewis
Copy link
Collaborator

keewis commented May 11, 2021

From xarray-contrib/xarray-tutorial#43 by @scottyhq:

Many notebooks out there start with the line ds = xr.tutorial.load_dataset("air_temperature"). That now gives an error traceback with xarray>=0.18:

Traceback (most recent call last):
  File "/Users/scott/GitHub/zarrdata/./create_zarr.py", line 6, in <module>
    ds = xr.tutorial.load_dataset("air_temperature")
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/tutorial.py", line 179, in load_dataset
    with open_dataset(*args, **kwargs) as ds:
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/tutorial.py", line 100, in open_dataset
    ds = _open_dataset(filepath, **kws)
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/backends/api.py", line 485, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
  File "/Users/scott/miniconda3/envs/zarrdata/lib/python3.9/site-packages/xarray/backends/plugins.py", line 112, in guess_engine
    raise ValueError("cannot guess the engine, try passing one explicitly")
ValueError: cannot guess the engine, try passing one explicitly

It's an easy fix though, just add ds = xr.tutorial.load_dataset("air_temperature", engine="netcdf4"), new users might be thrown by that though. Also a note that unless the netcdf4 library is explicitly put into the software environment, even adding the engine=netcdf4 can result in an error: "ValueError: unrecognized engine netcdf4 must be one of: ['store', 'zarr']", so I think a minimal environment definition to run would be:

name: xarray-tutorial
channels:
  - conda-forge
dependencies:
  - xarray=0.18
  - pooch=1.3
  - netcdf4=1.5
  - zarr=2.8
@keewis
Copy link
Collaborator Author

keewis commented May 12, 2021

I think there's something wrong with your environment: I can't reproduce in a environment created by

mamba create -n test python=3.9 mamba xarray=0.18 pooch=1.3 netcdf4=1.5 zarr=2.8

Edit: you could also try clearing ~/.xarray_tutorial_data / ~/.cache/xarray_tutorial_data

@keewis
Copy link
Collaborator Author

keewis commented May 12, 2021

we could definitely improve the error message, though. Something like "unknown engine {engine}, please choose one of the installed engines: {engines}", maybe?

@scottyhq
Copy link
Contributor

Thanks @keewis I should have been more clear about the environment. I was recently going over the tutorial with someone and started with:

  1. conda create -n xarray-tutorial xarray
    running ds = xr.tutorial.load_dataset("air_temperature") --> ImportError: using the tutorial data requires pooch
  2. we install pooch and then hit: ValueError: cannot guess the engine, try passing one explicitly
  3. after consulting the docstring we then try ds = xr.tutorial.load_dataset("air_temperature", engine="netcdf4") and hit ValueError: unrecognized engine netcdf4 must be one of: ['store']
  4. being familiar with xarray we then installed netcdf4 into our environment and all is well.

I do think these error messages are not obvious to fix for new xarray users trying out the tutorial (especially # 3 above)

we could definitely improve the error message, though. Something like "unknown engine {engine}, please choose one of the installed engines: {engines}", maybe?

Yes. Perhaps with a link to docs with a list of engines? for the tutorial case specifically could also update the ImportError message to read ImportError: please install 'pooch' and 'netcdf4' to use xarray tutorial data?

@shoyer
Copy link
Member

shoyer commented May 12, 2021

This is indeed unfortunate, thanks for the report!

I believe xarray.tutorial.load_dataset("air_temperature") works -- at least in my testing -- if either scipy or netCDF4 is installed (along with Pooch).

But if you don't have either scipy or netCDF4 installed, you get the very unhelpful error message ValueError: cannot guess the engine, try passing one explicitly

At the very least, we should report which engines are installed -- and ideally make a specific recommendation like installing scipy or netCDF4.

@max-sixty
Copy link
Collaborator

But if you don't have either scipy or netCDF4 installed, you get the very unhelpful error message ValueError: cannot guess the engine, try passing one explicitly

At the very least, we should report which engines are installed -- and ideally make a specific recommendation like installing scipy or netCDF4.

Yes and is this a problem with passing an engine? Could doing what it says ("passing one explicitly") make it work?

@shoyer
Copy link
Member

shoyer commented May 12, 2021

Good point -- in this case specifying an engine explicitly cannot help, because the required engine is not installed.

@shoyer
Copy link
Member

shoyer commented May 13, 2021

CC @alexamici @aurghs

I would suggest three levels of fixes:

  1. Better fall-back message when engine guessing fails: Better error message when no backend engine is found. #5300
  2. Suggesting specific dependencies to install (e.g., netCDF4) based on the provided file, from inside open_dataset.
  3. Suggesting specific dependencies to install when using tutorial datasets.

If we do a nice job of (2) than maybe (3) will not be necessary. I'll open a new issue to discuss my proposed solution for (2).

@shoyer
Copy link
Member

shoyer commented May 13, 2021

I've added a temporary work-around to provide better errors for the tutorial datasets into #5300, i.e., solution (3) from my previouso comment.

This is challenging to unit-test, so I'll just share my IPython session(s) from a new environment to show how it works:

(xarray-min)   ~/dev/xarray better-no-plugins-errors $ ipython
Python 3.9.4 (default, Apr  9 2021, 09:32:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray

In [2]: xarray.tutorial.open_dataset('air_temperature')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/dev/xarray/xarray/tutorial.py in open_dataset(name, cache, cache_dir, engine, **kws)
    113     try:
--> 114         import pooch
    115     except ImportError:

ModuleNotFoundError: No module named 'pooch'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-2-e970a0b41221> in <module>
----> 1 xarray.tutorial.open_dataset('air_temperature')

~/dev/xarray/xarray/tutorial.py in open_dataset(name, cache, cache_dir, engine, **kws)
    114         import pooch
    115     except ImportError:
--> 116         raise ImportError("using the tutorial data requires pooch")
    117
    118     logger = pooch.get_logger()

ImportError: using the tutorial data requires pooch

In [3]: ! pip install pooch
Collecting pooch
  Using cached pooch-1.3.0-py3-none-any.whl (51 kB)
Requirement already satisfied: requests in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from pooch) (2.25.1)
Requirement already satisfied: appdirs in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from pooch) (1.4.4)
Requirement already satisfied: packaging in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from pooch) (20.9)
Requirement already satisfied: pyparsing>=2.0.2 in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from packaging->pooch) (2.4.7)
Requirement already satisfied: certifi>=2017.4.17 in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from requests->pooch) (2020.12.5)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from requests->pooch) (1.26.4)
Requirement already satisfied: idna<3,>=2.5 in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from requests->pooch) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from requests->pooch) (4.0.0)
Installing collected packages: pooch
Successfully installed pooch-1.3.0

In [4]: xarray.tutorial.open_dataset('air_temperature')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/dev/xarray/xarray/tutorial.py in _check_netcdf_engine_installed(name)
     51         try:
---> 52             import scipy
     53         except ImportError:

ModuleNotFoundError: No module named 'scipy'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
~/dev/xarray/xarray/tutorial.py in _check_netcdf_engine_installed(name)
     54             try:
---> 55                 import netCDF4
     56             except ImportError:

ModuleNotFoundError: No module named 'netCDF4'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-4-e970a0b41221> in <module>
----> 1 xarray.tutorial.open_dataset('air_temperature')

~/dev/xarray/xarray/tutorial.py in open_dataset(name, cache, cache_dir, engine, **kws)
    128             default_extension = ".nc"
    129             if engine is None:
--> 130                 _check_netcdf_engine_installed(name)
    131             path = path.with_suffix(default_extension)
    132         elif path.suffix == ".grib":

~/dev/xarray/xarray/tutorial.py in _check_netcdf_engine_installed(name)
     55                 import netCDF4
     56             except ImportError:
---> 57                 raise ImportError(
     58                     f"opening tutorial dataset {name} requires either scipy or "
     59                     "netCDF4 to be installed."

ImportError: opening tutorial dataset air_temperature requires either scipy or netCDF4 to be installed.

In [5]: ! pip install netcdf4
Collecting netcdf4
  Using cached netCDF4-1.5.6-cp39-cp39-macosx_10_9_x86_64.whl (4.0 MB)
Requirement already satisfied: numpy>=1.9 in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from netcdf4) (1.20.1)
Requirement already satisfied: cftime in /Users/shoyer/miniconda3/envs/xarray-min/lib/python3.9/site-packages (from netcdf4) (1.4.1)
Installing collected packages: netcdf4
Successfully installed netcdf4-1.5.6

In [6]: xarray.tutorial.open_dataset('air_temperature')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-e970a0b41221> in <module>
----> 1 xarray.tutorial.open_dataset('air_temperature')

~/dev/xarray/xarray/tutorial.py in open_dataset(name, cache, cache_dir, engine, **kws)
    138     # retrieve the file
    139     filepath = pooch.retrieve(url=url, known_hash=None, path=cache_dir)
--> 140     ds = _open_dataset(filepath, engine=engine, **kws)
    141     if not cache:
    142         ds = ds.load()

~/dev/xarray/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    478
    479     if engine is None:
--> 480         engine = plugins.guess_engine(filename_or_obj)
    481
    482     backend = plugins.get_backend(engine)

~/dev/xarray/xarray/backends/plugins.py in guess_engine(store_spec)
    118         )
    119     else:
--> 120         raise ValueError(
    121             "xarray is unable to open this file because it has no currently "
    122             "installed IO backends. Xarray's read/write support requires "

ValueError: xarray is unable to open this file because it has no currently installed IO backends. Xarray's read/write support requires installing optional dependencies:
http://xarray.pydata.org/en/stable/getting-started-guide/installing.html
http://xarray.pydata.org/en/stable/user-guide/io.html

In [7]:
Do you really want to exit ([y]/n)? y
(xarray-min)   ~/dev/xarray better-no-plugins-errors $ ipython
Python 3.9.4 (default, Apr  9 2021, 09:32:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import xarray

In [2]: xarray.tutorial.open_dataset('air_temperature')
Out[2]:
<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

This isn't quite perfect -- the user needs to restart their session to actually load the data -- but hopefully this is much more obvious and roughly within user expectations.

@max-sixty
Copy link
Collaborator

Thanks a lot @shoyer .

Is there a way of having pooch as a default but not required dependency? I'm not sure how python works with this — can we specify a minimal / required dependency set, but have pooch installed by default?

It's a pretty light dependency, and it's unfortunate to add any sand into the gears of getting someone to the tutorial. (though an order of magnitude better since the change above)

@keewis
Copy link
Collaborator Author

keewis commented May 14, 2021

It is pretty light but pulls in a few additional dependencies. For normal python package managers (i.e. pip / poetry / etc.) we can probably take advantage of the options.extra_requires setting.

We could also change the current message ("using the tutorial data requires pooch") to something like

the tutorial.open_* functions depend on pooch to download and manage datasets. To proceed please install it from PyPI or conda / conda-forge.

@shoyer
Copy link
Member

shoyer commented May 14, 2021

Is there a way of having pooch as a default but not required dependency? I'm not sure how python works with this — can we specify a minimal / required dependency set, but have pooch installed by default?

pip install xarray is always the minimal set of dependencies, but we can make something like pip install xarray[tutorial] work. Actually, pooch should be (but currently isn't) included in xarray[io]:

[options.extras_require]

@max-sixty
Copy link
Collaborator

pip install xarray is always the minimal set of dependencies, but we can make something like pip install xarray[tutorial] work.

My question is whether we can make xarray by default pull the equivalent of xarray[tutorial], and a separate xarray[minimal] for anyone who really wants to avoid deps...

@shoyer
Copy link
Member

shoyer commented May 14, 2021

My question is whether we can make xarray by default pull the equivalent of xarray[tutorial], and a separate xarray[minimal] for anyone who really wants to avoid deps...

I don't think this is possible currently with pip: pypa/setuptools#1139

I think it can be done with conda, but typically by defining two separate packages, e.g., dask-core and dask. The seems a little annoying to setup and possibly surprising for users who expect names to match pip, though.

@dopplershift
Copy link
Contributor

On conda it's usually only done to avoid problematic/heavy weight dependencies (i.e. avoiding pyqt dependency with matplotlib-base). I'm not sure it's worth doing for pooch.

@max-sixty
Copy link
Collaborator

Thanks. Agree it's not worth that lift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants