Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad conda solve with pandas 2 #7716

Closed
4 tasks
dcherian opened this issue Apr 4, 2023 · 18 comments
Closed
4 tasks

bad conda solve with pandas 2 #7716

dcherian opened this issue Apr 4, 2023 · 18 comments
Labels
Release Planning and tracking progress of releases

Comments

@dcherian
Copy link
Contributor

dcherian commented Apr 4, 2023

What happened?

Pandas 2 is out.

We have a pandas<2 pin for our latest release, but mamba is now returning xarray=2023.1.0 and pandas=2.0 which is making cf-xarray and flox tests fail.

It looks like any project that tests resample without pinning pandas will fail.

I opened the issue here for visibility. It seems we might need a repodata patch to disallow pandas<2?

cc @ocefpaf

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

@dcherian dcherian added the Release Planning and tracking progress of releases label Apr 4, 2023
@ocefpaf
Copy link
Contributor

ocefpaf commented Apr 4, 2023

We need to do a repodata patch for the current xarray. I'll get to it soon.

@ocefpaf
Copy link
Contributor

ocefpaf commented Apr 4, 2023

@dcherian do you mind taking a look at conda-forge/conda-forge-repodata-patches-feedstock#426? Please check the versions patched and the applied patch! Thanks!

@rabernat
Copy link
Contributor

rabernat commented Apr 5, 2023

Do we have a plan to support pandas 2?

@keewis
Copy link
Collaborator

keewis commented Apr 5, 2023

with the merge of #7441 we should already support pandas=2.0 on main. I think we can try unpinning in a PR to see how many of the errors from #7707 are related to pandas (none, hopefully, but I'm not sure).

@spencerkclark
Copy link
Member

The cftime-related ones are still pandas related, but hopefully should be straightforward to address.

@keewis
Copy link
Collaborator

keewis commented Apr 5, 2023

CI says these are the tests we'd need to fix:

FAILED xarray/tests/test_coding_times.py::test_should_cftime_be_used_source_outside_range - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[360_day] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[365_day] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[366_day] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[all_leap] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[gregorian] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[julian] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[noleap] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[proleptic_gregorian] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_cftimeindex.py::test_to_datetimeindex_out_of_range[standard] - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_sel_float - NotImplementedError: float16 indexes are not supported

Edit: one more on windows:

FAILED xarray/tests/test_utils.py::test_maybe_coerce_to_str[1-2-expected1] - AssertionError: assert dtype('int64') == dtype('int32')
 +  where dtype('int64') = Index([1, 2], dtype='int64').dtype
 +  and   dtype('int32') = Index([1, 2], dtype='int32').dtype

Edit2: the doctests also fail:

FAILED xarray/core/accessor_dt.py::xarray.core.accessor_dt.DatetimeAccessor
FAILED xarray/core/accessor_dt.py::xarray.core.accessor_dt.TimedeltaAccessor
FAILED xarray/core/dataarray.py::xarray.core.dataarray.DataArray.groupby

@CommonClimate
Copy link

Eagerly awaiting a resolution to this, as the newest Pyleoclim release will be requiring pandas >=2.0, which will make it incompatible with xarray as things stand.

@mroeschke
Copy link
Contributor

CI says these are the tests we'd need to fix:

Chiming in from the pandas side on those failures, I think they are all expected https://pandas.pydata.org/docs/whatsnew/v2.0.0.html namely

  • Since pandas supports s, ms and us numpy datetime resolutions now, I'm guessing that's why test_to_datetimeindex_out_of_range and test_should_cftime_be_used_source_outside_range are failing
  • For test_sel_float, pd.Index intentionally does not support np.float16 as dtype anymore (we never had an indexing engine for this dtype)
  • For test_maybe_coerce_to_str, this might be expected too as Index dtypes can hold np.int32 types now

@jsignell
Copy link
Contributor

jsignell commented Apr 5, 2023

In that case it could be reasonable to mimic the pattern in test_groupby and mark the failing tests with a @pytest.mark.skipif(not has_pandas_version_two, reason="Tests a scenario that only raises when pandas <= 2")

@dcherian
Copy link
Contributor Author

dcherian commented Apr 5, 2023

I think they are all expected pandas.pydata.org/docs/whatsnew/v2.0.0.html namely

Yes they are. We just haven't had the time to fix things.

@keewis
Copy link
Collaborator

keewis commented Apr 5, 2023

For test_should_cftime_be_used_source_outside_range and test_to_datetimeindex_out_of_range I'd probably use a date that is outside the s resolution range (not sure if that actually makes sense, though). What do you think, @spencerkclark?

For test_maybe_coerce_to_str I think the reason is that we use np.array to cast a python int to an array, but the default resolution is different on windows. Apparently, pandas still uses int64 if constructed directly from python ints, but numpy uses int32 on windows, and as you say pandas does not insist on int64 anymore. The fix would be to explicitly specify the dtype in the array calls.

And finally, I'm not sure what to do with test_sel_float. Maybe we can split the monolithic test into one parametrized by dtype and skip the float16 test variant for pandas>=2.0?

@spencerkclark
Copy link
Member

Thanks @keewis -- I went ahead and added some more detailed thoughts about how to address the cftime-related failures in #7707. I can try to address those soon.

@dcherian
Copy link
Contributor Author

Should be fixed with the various repodata patches

@mrocklin
Copy link
Contributor

I'm still running into this today when using only conda-forge

Encountered problems while solving:
  - package xarray-2023.1.0-pyhd8ed1ab_0 requires pandas >=1.3,<2a0, but none of the providers can be installed

When I add defaults the problem goes away

channels:
  - conda-forge
  - defaults

@keewis
Copy link
Collaborator

keewis commented Apr 16, 2023

can you share a bit more about the environment you're trying to create? Is that by chance a py38 environment, or does one of the libraries you're trying to install have a upper bound for xarray? xarray>=2023.04.0 does not have the pin on pandas anymore, so you should be able to install that even if the repodata patches didn't work.

@mrocklin
Copy link
Contributor

This was the environment, solved on M1 Mac

name: coiled
channels:
  - conda-forge
  - defaults
dependencies:
  - python==3.10
  - dask
  - dask-ml
  - coiled
  - pyarrow
  - s3fs
  - matplotlib
  - ipykernel
  - dask-labextension
  - xgboost
  - pandas=2
  - optuna
  - xarray
  - geogif
  - zarr
  - pip
  - pip:
    - git+https://github.com/optuna/optuna

I can try to minify this in a bit, although I'm on airport wifi right now, and it has started to kick me off, I suspect due to these sorts of activities.

@keewis
Copy link
Collaborator

keewis commented Apr 16, 2023

I'm on a train wifi, so not really better. However, I think this is because xarray=2023.04.0 is not on conda-forge, yet (the PR to the feedstock still has to be merged), so you can't install xarray and pandas=2 into the same environment. As a workaround, you can try installing xarray using pip.

@mrocklin
Copy link
Contributor

That makes sense. Just following up, but this fails today:

name: test-1
channels:
  - conda-forge
dependencies:
  - xarray
  - pandas=2

It sounds like this will work itself out though and no further work here needs to be done (unless someone wants to go press some green buttons on conda-forge somewhere)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release Planning and tracking progress of releases
Projects
None yet
Development

No branches or pull requests

9 participants