Skip to content

Commit

Permalink
Preserve Index and grouped columns in Groupby.nth (#13442)
Browse files Browse the repository at this point in the history
In pandas-2.0 `groupby.nth` behavior has changed: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations

This PR enables preserving the callers index in the end result and returns grouping columns as part of the result.

This PR fixes all 12 pytests in `python/cudf/cudf/tests/test_groupby.py::test_groupby_nth`
  • Loading branch information
galipremsagar authored May 30, 2023
1 parent 2dafcfc commit 16c987e
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions python/cudf/cudf/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -802,10 +802,21 @@ def nth(self, n):
"""
Return the nth row from each group.
"""
result = self.agg(lambda x: x.nth(n)).sort_index()
sizes = self.size().sort_index()

return result[sizes > n]
self.obj["__groupbynth_order__"] = range(0, len(self.obj))
# We perform another groupby here to have the grouping columns
# be a part of dataframe columns.
result = self.obj.groupby(self.grouping.keys).agg(lambda x: x.nth(n))
sizes = self.size().reindex(result.index)

result = result[sizes > n]

result._index = self.obj.index.take(
result._data["__groupbynth_order__"]
)
del result._data["__groupbynth_order__"]
del self.obj._data["__groupbynth_order__"]
return result

@_cudf_nvtx_annotate
def ngroup(self, ascending=True):
Expand Down

0 comments on commit 16c987e

Please sign in to comment.