Add Ellipsis typehint to reductions #7048

headtr1ck · 2022-09-16T21:15:10Z

This PR adds the ellipsis typehint to reductions (only where they behave differently from None to reduce overhead).
Follow up on #7017 (comment)

Additionally I was changing a lot of "one or more dimensions" typehints to str | Iterable[Hashable] (See #6142).
Some code changes were necessary to support this fully. Before several things were not working with actual hashable dimensions that are not strings.

max-sixty · 2022-09-16T21:46:16Z

Excellent @headtr1ck !

Do we need to run pytest --accept to get the docstrings? It looks like we lost lots...

headtr1ck · 2022-09-17T11:31:01Z

Could any dev that uses linux rerun the generate_reductions and pytest --doctest-modules xarray/core/_reductions.py --accept?
On windows I still get different results (maybe that should be fixed at some point...)

xarray/util/generate_reductions.py

headtr1ck · 2022-09-17T14:20:16Z

Turns out that the buildin ellipsis works now with mypy.
Did not test it for older python versions, may require some special casing (Lets see if the tests pass)?

max-sixty · 2022-09-17T19:38:08Z

Here's the diff from pytest-accept (it is weird that it's slightly different on windows...)

commit 83615a94a6b7c0ae0cf0e0240d7705d9ce6c21e5
Author: Maximilian Roos <[email protected]>
Date:   Sat Sep 17 12:36:45 2022 -0700

    pytest-accept

diff --git a/xarray/core/_reductions.py b/xarray/core/_reductions.py
index a7cf7ec2..d0c2a9d7 100644
--- a/xarray/core/_reductions.py
+++ b/xarray/core/_reductions.py
@@ -97,7 +97,7 @@ def count(
         <xarray.Dataset>
         Dimensions:  ()
         Data variables:
-            da       int32 5
+            da       int64 5
         """
         return self.reduce(
             duck_array_ops.count,
@@ -4400,7 +4400,7 @@ def count(
 
         >>> da.groupby("labels").count()
         <xarray.DataArray (labels: 3)>
-        array([1, 2, 2], dtype=int64)
+        array([1, 2, 2])
         Coordinates:
           * labels   (labels) object 'a' 'b' 'c'
         """
@@ -5485,7 +5485,7 @@ def count(
 
         >>> da.resample(time="3M").count()
         <xarray.DataArray (time: 3)>
-        array([1, 3, 1], dtype=int64)
+        array([1, 3, 1])
         Coordinates:
           * time     (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
         """

Illviljan

I wonder if Dims should include ellipsis as well? The few times it's missing might be issues with the functions?

xarray/core/dataarray.py

xarray/core/dataset.py

xarray/core/variable.py

xarray/core/weighted.py

Illviljan · 2022-09-18T07:13:43Z

@@ -4400,7 +4400,7 @@ def count(
 
         >>> da.groupby("labels").count()
         <xarray.DataArray (labels: 3)>
-        array([1, 2, 2], dtype=int64)
+        array([1, 2, 2])
         Coordinates:
           * labels   (labels) object 'a' 'b' 'c'
         """

Is it just me that this example crashes the second time I run it?

import numpy as np
import pandas as pd
import xarray as xr

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

Traceback (most recent call last):

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "g:\program\dropbox\python\xarray_groupby_windows_diff.py", line 34, in <module>
    da.groupby("labels").count()

  File "c:\users\j.w\documents\github\xarray\xarray\core\_reductions.py", line 4384, in count
    return self._flox_reduce(

  File "c:\users\j.w\documents\github\xarray\xarray\core\groupby.py", line 738, in _flox_reduce
    result = xarray_reduce(

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\flox\xarray.py", line 240, in xarray_reduce
    ds, *by = xr.broadcast(ds, *by, exclude=exclude_dims)

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 1046, in broadcast
    args = align(*args, join="outer", copy=False, exclude=exclude)

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 765, in align
    aligner.align()

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 549, in align
    self.find_matching_indexes()

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 256, in find_matching_indexes
    obj_indexes, obj_index_vars = self._normalize_indexes(obj.xindexes)

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 205, in _normalize_indexes
    pd_idx = safe_cast_to_index(data)

  File "c:\users\j.w\documents\github\xarray\xarray\core\utils.py", line 140, in safe_cast_to_index
    index = pd.Index(np.asarray(array), **kwargs)

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\pandas\core\indexes\base.py", line 483, in __new__
    data = sanitize_array(data, None, dtype=dtype, copy=copy)

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\pandas\core\construction.py", line 524, in sanitize_array
    raise ValueError("index must be specified when data is not list-like")

ValueError: index must be specified when data is not list-like

headtr1ck · 2022-09-18T12:17:32Z

Is it just me that this example crashes the second time I run it?

Could you specify what you mean by "second time I run it"?
Executing the groupby twice?

Illviljan · 2022-09-18T14:20:51Z

Just running that script file several times without restarting the console. It might be a Spyder bug though since I can't reproduce it in a stand alone ipython console.

This for example works (the first time):

import numpy as np
import pandas as pd
import xarray as xr

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

xarray/core/dataset.py

headtr1ck · 2022-09-25T15:42:49Z

import numpy as np
import pandas as pd
import xarray as xr

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

For me this works in a python terminal, python script and jupyter notebook (I don't use spyder but vscode).

headtr1ck added 3 commits September 16, 2022 22:56

update reductions typing to include ellipsis

d4cbad2

update all compatible interfaces to accept ellipsis

3ee697d

change more dim types to str | Iterable[Hashable]

4bc4335

github-actions bot added topic-groupby topic-rolling labels Sep 16, 2022

headtr1ck changed the title ~~Ellipsis~~ Add Ellipsis typehint to reductions Sep 16, 2022

headtr1ck added 2 commits September 17, 2022 13:19

re-add docstring examples

fb31762

fix bug in reduce for 1D arrays

5fa6255

Illviljan reviewed Sep 17, 2022

View reviewed changes

xarray/util/generate_reductions.py Outdated Show resolved Hide resolved

headtr1ck added 3 commits September 17, 2022 16:06

use builtin ellipsis type

93bc0a9

Merge branch 'main' into ellipsis

c797834

add Dims type

70a5756

github-actions bot added the topic-typing label Sep 17, 2022

add forgotten Dims type

a143e6d

update docstring examples to fit to linux

1aa578e

Illviljan reviewed Sep 18, 2022

View reviewed changes

replace more instances with Dims type

f563fbd

replace some leftovers with Dims

f0af587

Illviljan reviewed Sep 18, 2022

View reviewed changes

xarray/core/dataset.py Show resolved Hide resolved

This was referenced Sep 20, 2022

Fix mypy errors in xarray.py, xrutils.py, cache.py xarray-contrib/flox#144

Merged

Add set_xindex and drop_indexes methods #6971

Merged

headtr1ck added the plan to merge Final call for comments label Sep 25, 2022

headtr1ck mentioned this pull request Sep 27, 2022

Align typing of dimension inputs #7094

Open

3 tasks

dcherian merged commit 226c23b into pydata:main Sep 28, 2022

headtr1ck deleted the ellipsis branch September 28, 2022 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ellipsis typehint to reductions #7048

Add Ellipsis typehint to reductions #7048

headtr1ck commented Sep 16, 2022

max-sixty commented Sep 16, 2022

headtr1ck commented Sep 17, 2022

headtr1ck commented Sep 17, 2022

max-sixty commented Sep 17, 2022

Illviljan left a comment

Illviljan commented Sep 18, 2022 •

edited

Loading

headtr1ck commented Sep 18, 2022

Illviljan commented Sep 18, 2022

headtr1ck commented Sep 25, 2022

Add Ellipsis typehint to reductions #7048

Add Ellipsis typehint to reductions #7048

Conversation

headtr1ck commented Sep 16, 2022

max-sixty commented Sep 16, 2022

headtr1ck commented Sep 17, 2022

headtr1ck commented Sep 17, 2022

max-sixty commented Sep 17, 2022

Illviljan left a comment

Choose a reason for hiding this comment

Illviljan commented Sep 18, 2022 • edited Loading

headtr1ck commented Sep 18, 2022

Illviljan commented Sep 18, 2022

headtr1ck commented Sep 25, 2022

Illviljan commented Sep 18, 2022 •

edited

Loading