Skip to content

Commit

Permalink
Change default value of the observed= argument in groupby to True
Browse files Browse the repository at this point in the history
… to reflect the actual behaviour (#13296)

When grouping by a categorical, we default to the behaviour `observed=True`. I'm changing the value of the kwarg to reflect that:

```python
In [43]: df = pd.DataFrame({'a': [1, 2, 3, 4]})

In [44]: idx = pd.CategoricalIndex([1, 1, 2, 3], categories=[1, 2, 3, 4, 5])

In [45]: df.groupby(idx).sum()
Out[45]:
   a
1  3
2  3
3  4
4  0
5  0

In [46]: df.groupby(idx, observed=True).sum()
Out[46]:
   a
1  3
2  3
3  4

In [47]: df = cudf.DataFrame({'a': [1, 2, 3, 4]})

In [48]: idx = cudf.CategoricalIndex([1, 1, 2, 3], categories=[1, 2, 3, 4, 5])

In [49]: df.groupby(idx).sum()
Out[49]:
   a
1  3
3  4
2  3
```

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #13296
  • Loading branch information
shwina authored May 5, 2023
1 parent a208d21 commit f881c6c
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion python/cudf/cudf/core/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -4072,7 +4072,7 @@ def groupby(
sort=False,
group_keys=False,
squeeze=False,
observed=False,
observed=True,
dropna=True,
):
return super().groupby(
Expand Down
4 changes: 2 additions & 2 deletions python/cudf/cudf/core/indexed_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3918,7 +3918,7 @@ def groupby(
sort=False,
group_keys=False,
squeeze=False,
observed=False,
observed=True,
dropna=True,
):
if axis not in (0, "index"):
Expand All @@ -3929,7 +3929,7 @@ def groupby(
"squeeze parameter is not yet implemented"
)

if observed is not False:
if not observed:
raise NotImplementedError(
"observed parameter is not yet implemented"
)
Expand Down
2 changes: 1 addition & 1 deletion python/cudf/cudf/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3297,7 +3297,7 @@ def groupby(
sort=False,
group_keys=False,
squeeze=False,
observed=False,
observed=True,
dropna=True,
):
return super().groupby(
Expand Down

0 comments on commit f881c6c

Please sign in to comment.