Skip to content

Commit

Permalink
Enable groupby list aggregation for strings(#6914)
Browse files Browse the repository at this point in the history
Authors:
  - Ashwin Srinath <[email protected]>
  - Ashwin Srinath <[email protected]>

Approvers:
  - Keith Kraus
  - Keith Kraus
  - Keith Kraus

URL: #6914
  • Loading branch information
shwina authored Dec 4, 2020
1 parent 30bbb39 commit bd321d1
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
- PR #6837 Avoid gather when copying strings view from start of strings column
- PR #6859 Move align_ptr_for_type() from cuda.cuh to alignment.hpp
- PR #6807 Refactor `std::array` usage in row group index writing in ORC
- PR #6914 Enable groupby `list` aggregation for strings
- PR #6908 Parquet option for strictly decimal reading

## Bug Fixes
Expand Down
1 change: 1 addition & 0 deletions python/cudf/cudf/_lib/groupby.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ _STRING_AGGS = {
"min",
"nunique",
"nth",
"collect"
}

_LIST_AGGS = {
Expand Down
20 changes: 20 additions & 0 deletions python/cudf/cudf/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1268,6 +1268,26 @@ def test_groupby_list_single_element(list_agg):
)


@pytest.mark.parametrize(
"agg", [list, [list, "count"], {"b": list, "c": "sum"}]
)
def test_groupby_list_strings(agg):
pdf = pd.DataFrame(
{
"a": [1, 1, 1, 2, 2],
"b": ["b", "a", None, "e", "d"],
"c": [1, 2, 3, 4, 5],
}
)
gdf = cudf.from_pandas(pdf)

assert_eq(
pdf.groupby("a").agg(agg),
gdf.groupby("a").agg(agg),
check_dtype=False,
)


def test_groupby_list_columns_excluded():
pdf = pd.DataFrame(
{
Expand Down

0 comments on commit bd321d1

Please sign in to comment.