Skip to content

Commit

Permalink
GroupBy(multiple groupers) (#9372)
Browse files Browse the repository at this point in the history
* GroupBy(multiple groupers)

* Add example to docs

fix docs

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix docs

* More docs

* fix doc

* fix doc again

* Fix bug.

* Add whats-new note

* edit

* Error on multi-variable groupby with MultiIndex

* Update doc/user-guide/groupby.rst

---------

Co-authored-by: Maximilian Roos <[email protected]>
  • Loading branch information
dcherian and max-sixty authored Aug 26, 2024
1 parent 6a2eddd commit 19a0428
Show file tree
Hide file tree
Showing 6 changed files with 423 additions and 106 deletions.
44 changes: 28 additions & 16 deletions doc/user-guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,7 @@ You can index out a particular group:
ds.groupby("letters")["b"]
Just like in pandas, creating a GroupBy object is cheap: it does not actually
split the data until you access particular values.
To group by multiple variables, see :ref:`this section <groupby.multiple>`.

Binning
~~~~~~~
Expand Down Expand Up @@ -180,19 +179,6 @@ This last line is roughly equivalent to the following::
results.append(group - alt.sel(letters=label))
xr.concat(results, dim='x')

Iterating and Squeezing
~~~~~~~~~~~~~~~~~~~~~~~

Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
a GroupBy object. This behaviour is being removed.
You can always squeeze explicitly later with the Dataset or DataArray
:py:meth:`DataArray.squeeze` methods.

.. ipython:: python
next(iter(arr.groupby("x", squeeze=False)))
.. _groupby.multidim:

Multidimensional Grouping
Expand Down Expand Up @@ -236,6 +222,8 @@ applying your function, and then unstacking the result:
stacked = da.stack(gridcell=["ny", "nx"])
stacked.groupby("gridcell").sum(...).unstack("gridcell")
Alternatively, you can groupby both `lat` and `lon` at the :ref:`same time <groupby.multiple>`.

.. _groupby.groupers:

Grouper Objects
Expand Down Expand Up @@ -276,7 +264,8 @@ is identical to
ds.groupby(x=UniqueGrouper())
and
Similarly,

.. code-block:: python
Expand All @@ -303,3 +292,26 @@ is identical to
from xarray.groupers import TimeResampler
ds.resample(time=TimeResampler("ME"))
.. _groupby.multiple:

Grouping by multiple variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use grouper objects to group by multiple dimensions:

.. ipython:: python
from xarray.groupers import UniqueGrouper
da.groupby(lat=UniqueGrouper(), lon=UniqueGrouper()).sum()
Different groupers can be combined to construct sophisticated GroupBy operations.

.. ipython:: python
from xarray.groupers import BinGrouper
ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ New Features
~~~~~~~~~~~~
- Make chunk manager an option in ``set_options`` (:pull:`9362`).
By `Tom White <https://github.com/tomwhite>`_.
- Support for :ref:`grouping by multiple variables <groupby.multiple>`.
This is quite new, so please check your results and report bugs.
Binary operations after grouping by multiple arrays are not supported yet.
(:issue:`1056`, :issue:`9332`, :issue:`324`, :pull:`9372`).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Allow data variable specific ``constant_values`` in the dataset ``pad`` function (:pull:`9353``).
By `Tiago Sanona <https://github.com/tsanona>`_.

Expand Down
19 changes: 7 additions & 12 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -6801,27 +6801,22 @@ def groupby(
groupers = either_dict_or_kwargs(group, groupers, "groupby") # type: ignore
group = None

grouper: Grouper
rgroupers: tuple[ResolvedGrouper, ...]
if group is not None:
if groupers:
raise ValueError(
"Providing a combination of `group` and **groupers is not supported."
)
grouper = UniqueGrouper()
rgroupers = (ResolvedGrouper(UniqueGrouper(), group, self),)
else:
if len(groupers) > 1:
raise ValueError("grouping by multiple variables is not supported yet.")
if not groupers:
raise ValueError("Either `group` or `**groupers` must be provided.")
group, grouper = next(iter(groupers.items()))

rgrouper = ResolvedGrouper(grouper, group, self)
rgroupers = tuple(
ResolvedGrouper(grouper, group, self)
for group, grouper in groupers.items()
)

return DataArrayGroupBy(
self,
(rgrouper,),
restore_coord_dims=restore_coord_dims,
)
return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

@_deprecate_positional_args("v2024.07.0")
def groupby_bins(
Expand Down
19 changes: 8 additions & 11 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -10397,25 +10397,22 @@ def groupby(
groupers = either_dict_or_kwargs(group, groupers, "groupby") # type: ignore
group = None

rgroupers: tuple[ResolvedGrouper, ...]
if group is not None:
if groupers:
raise ValueError(
"Providing a combination of `group` and **groupers is not supported."
)
rgrouper = ResolvedGrouper(UniqueGrouper(), group, self)
rgroupers = (ResolvedGrouper(UniqueGrouper(), group, self),)
else:
if len(groupers) > 1:
raise ValueError("Grouping by multiple variables is not supported yet.")
elif not groupers:
if not groupers:
raise ValueError("Either `group` or `**groupers` must be provided.")
for group, grouper in groupers.items():
rgrouper = ResolvedGrouper(grouper, group, self)
rgroupers = tuple(
ResolvedGrouper(grouper, group, self)
for group, grouper in groupers.items()
)

return DatasetGroupBy(
self,
(rgrouper,),
restore_coord_dims=restore_coord_dims,
)
return DatasetGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

@_deprecate_positional_args("v2024.07.0")
def groupby_bins(
Expand Down
Loading

0 comments on commit 19a0428

Please sign in to comment.