`pandas=2.0` support #7724

keewis · 2023-04-05T14:52:30Z

As mentioned in #7716 (comment), this tries to unpin pandas.

keewis · 2023-04-06T11:40:12Z

It seems this fixes the failing tests unrelated to the datetime64 range.

Regarding the failing doctests, I decided to cast the values of arr.dt.<field> to int64 to keep the API unchanged. However, if we decide to follow pandas and return int32 I'm happy to make that change.

keewis

here's some comments for the individual changes to make reviewing easier

keewis · 2023-04-06T14:04:00Z

xarray/tests/test_dataarray.py

+            pytest.param(
+                np.array([0.0, 0.111, 0.222, 0.333], dtype="float16"),
+                slice(1, 3),
+                id="float16",
+                marks=[
+                    pytest.mark.skipif(
+                        has_pandas_version_two, reason="not supported for pandas >= 2.0"
+                    )
+                ],
+            ),


this is the main change here: we skip the float16 check with pandas>=2.0. Another option would be to change the code to explicitly cast to float32 / float64, then remove the skipif

explicit cast sounds good to me.

to which precision should we cast? float64?

Yes, IIUC these arrays are converted to pandas Indexes, which used to upcast to float64 always.

Perhaps we should upcast for backwards compatibility and raise a DeprecationWarning asking the user to cast explicitly

this turns out not to be as simple as I thought it would be: the cast to PandasIndex (and thus pandas.Index) happens in multiple places. First is the DataArray / Dataset constructor, which through several layers calls safe_cast_to_index, where the actual cast to PandasIndex happens. The second cast happens when selecting using a array of values. In that case, get_indexer_nd calls index.get_indexer (pandas.Index.get_indexer), which appears to create a index from the indexer values.

Should both casts emit the DeprecationWarning? I think the second might be a implementation detail and that we should just cast the indexer values from float16 to float64 (not sure, though).

cc @benbovy

keewis · 2023-04-06T14:04:48Z

xarray/tests/test_utils.py

+        [np.array(["a"]), np.array(["b"]), np.array(["a", "b"])],
+        [np.array([1], dtype="int64"), np.array([2], dtype="int64"), pd.Index([1, 2])],


here, the idea is to avoid the different default precision on windows by explicitly setting the int precision on construction

xarray/tests/test_accessor_dt.py

xarray/core/accessor_dt.py

xarray/tests/test_accessor_dt.py

(all except 0d)

this allows us to check for expected warnings

keewis · 2023-04-07T09:25:49Z

@Illviljan, @headtr1ck, I just noticed that the CI version of mypy is pinned to <0.990, but the pre-commit hook is at 1.1.1. Is that intentional / known? Running the hook also exposes quite a few typing errors.

Otherwise this should be ready for a final review, and the failing datetime-related tests will be fixed by #7731.

headtr1ck · 2023-04-08T14:53:42Z

@Illviljan, @headtr1ck, I just noticed that the CI version of mypy is pinned to <0.990, but the pre-commit hook is at 1.1.1. Is that intentional / known? Running the hook also exposes quite a few typing errors.

I think this is because of #7270

The pre-commit hook of mypy should be disabled anyway because it takes too long to run (should be actually much faster since mypy >=1).

I think we could check if the newest mypy still segfaults or not...

keewis · 2023-04-09T13:57:19Z

I think we could check if the newest mypy still segfaults or not...

it didn't for me when I ran the hook, but that might just have been luck. To confirm, we'd need a PR that unpins mypy and potentially fixes all the errors that popped up since.

The mypy hook didn't take that long, but I guess it's the accumulated time that counts. Also, I learned today that the black-jupyter hook does all that black does plus format notebooks, so removing id: black should also shave off a bit.

headtr1ck · 2023-04-09T15:09:37Z

Locally it always works, it segfaults in CI, which makes it impossible to debug.

jsignell · 2023-04-11T15:58:48Z

Is there anything I can do to help out on this? It sounds like the blocker is mypy?

keewis · 2023-04-11T21:00:05Z

no, mypy should be fine, as the CI version does not complain and I assume whatever the hook is reporting is also present on main (I didn't check, though).

As far as I can tell, the only thing left is another round of reviews.

dcherian

LGTM!

dcherian · 2023-04-11T21:07:32Z

I think we can double check that the only failures are cftimeindex, restore the pin, then merge, and then remove the pin in #7731

* main: (34 commits) Update whats-new.rst Fix binning by unsorted array (pydata#7762) Bump codecov/codecov-action from 3.1.1 to 3.1.2 (pydata#7760) Fix typing errors using mypy 1.2 (pydata#7752) [skip-ci] dev whats-new Add whats-new for v2023.04.0 (pydata#7757) remove the `black` hook (pydata#7756) reword the what's new entry for the `pandas` 2.0 dtype changes (pydata#7755) restructure the contributing guide (pydata#7681) Continue to use nanosecond-precision Timestamps in precision-sensitive areas (pydata#7731) minor doc updates to clarify extensions using accessors (pydata#7751) align: Avoid reindexing when join="exact" (pydata#7736) `pandas=2.0` support (pydata#7724) Clarify vectorized indexing documentation (pydata#7747) Avoid recasting a CFTimeIndex (pydata#7735) fix typo (pydata#7746) [pre-commit.ci] pre-commit autoupdate (pydata#7745) Bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (pydata#7743) preserve boolean dtype in encoding (pydata#7720) [skip-ci] Add alignment benchmarks (pydata#7738) ...

keewis added 2 commits April 5, 2023 16:48

unpin pandas in the package metadata

7fcb376

unpin pandas in the ci environments [skip-rtd]

414ca7d

keewis changed the title ~~try unpinning pandas~~ try to unpin pandas Apr 5, 2023

github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Apr 5, 2023

keewis added 16 commits April 6, 2023 09:57

create the input arrays in the parametrization

e8b6aaa

split test_sel_float into variants

db315e2

skip the float16 variant if pandas>=2.0 is installed

04a2073

[skip-rtd]

f00e776

Merge branch 'main' into unpin-pandas

d4e666c

add tests for days_in_month and its alias

10922a1

make sure the name and dtype match the expected

f02f069

actually verify that the dtype stays the same

4c5c055

apply the dtype for non-dask

ad956f8

always use int32 to follow pandas=2.0

0a01aec

back to int64

16fdcf8

same for the test

e4db746

final undo of int64 → int32

422f08b

update the comment to make more sense

00f5b90

simplify the conversion of the expected data

7e5f719

change back to the old condition

6a522cb

keewis mentioned this pull request Apr 6, 2023

Continue to use nanosecond-precision Timestamps in precision-sensitive areas #7731

Merged

keewis commented Apr 6, 2023

View reviewed changes

keewis added 3 commits April 6, 2023 23:31

cast float16 to float64 when creating indexes (but warn anyways)

7803e66

convert float16 to float64 when selecting using arrays

1792307

(all except 0d)

move the float16 variant to a separate test

230cb85

this allows us to check for expected warnings

github-actions bot added the topic-indexing label Apr 6, 2023

keewis changed the title ~~try to unpin pandas~~ support pandas=2.0 Apr 7, 2023

keewis changed the title ~~support pandas=2.0~~ pandas=2.0 support Apr 7, 2023

keewis added 2 commits April 7, 2023 11:16

explicitly type the kwargs as a mapping of str → str

144ad7f

reword the warning message

54832cf

Merge branch 'main' into unpin-pandas

2f586aa

keewis added the plan to merge Final call for comments label Apr 11, 2023

dcherian approved these changes Apr 11, 2023

View reviewed changes

keewis added 5 commits April 12, 2023 09:53

restore the pin

68f6c00

Merge branch 'main' into unpin-pandas

3e6f38a

[skip-ci] [skip-rtd]

f165c6c

rerun to make sure we don't introduce failures with pandas<2

47b3b99

changelog

ec15fb4

dcherian merged commit db2d414 into pydata:main Apr 12, 2023

keewis deleted the unpin-pandas branch April 12, 2023 13:24

keewis mentioned this pull request Apr 14, 2023

reword the what's new entry for the pandas 2.0 dtype changes #7755

Merged

idantene mentioned this pull request Jun 19, 2023

.dt accessor returns int instead of float, resulting in misrepresentation of NaT values #7928

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pandas=2.0` support #7724

`pandas=2.0` support #7724

keewis commented Apr 5, 2023

keewis commented Apr 6, 2023

keewis left a comment •

edited

Loading

keewis Apr 6, 2023

dcherian Apr 6, 2023

keewis Apr 6, 2023

dcherian Apr 6, 2023

keewis Apr 6, 2023 •

edited

Loading

keewis Apr 6, 2023

keewis commented Apr 7, 2023

headtr1ck commented Apr 8, 2023

keewis commented Apr 9, 2023

headtr1ck commented Apr 9, 2023

jsignell commented Apr 11, 2023

keewis commented Apr 11, 2023

dcherian left a comment

dcherian commented Apr 11, 2023

		[np.array(["a"]), np.array(["b"]), np.array(["a", "b"])],
		[np.array([1], dtype="int64"), np.array([2], dtype="int64"), pd.Index([1, 2])],

pandas=2.0 support #7724

pandas=2.0 support #7724

Conversation

keewis commented Apr 5, 2023

keewis commented Apr 6, 2023

keewis left a comment • edited Loading

Choose a reason for hiding this comment

keewis Apr 6, 2023

Choose a reason for hiding this comment

dcherian Apr 6, 2023

Choose a reason for hiding this comment

keewis Apr 6, 2023

Choose a reason for hiding this comment

dcherian Apr 6, 2023

Choose a reason for hiding this comment

keewis Apr 6, 2023 • edited Loading

Choose a reason for hiding this comment

keewis Apr 6, 2023

Choose a reason for hiding this comment

keewis commented Apr 7, 2023

headtr1ck commented Apr 8, 2023

keewis commented Apr 9, 2023

headtr1ck commented Apr 9, 2023

jsignell commented Apr 11, 2023

keewis commented Apr 11, 2023

dcherian left a comment

Choose a reason for hiding this comment

dcherian commented Apr 11, 2023

`pandas=2.0` support #7724

`pandas=2.0` support #7724

keewis left a comment •

edited

Loading

keewis Apr 6, 2023 •

edited

Loading