Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Support groupby operations for decimal dtypes #7731

Merged
merged 39 commits into from
Apr 1, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
26bafd0
Don't identify decimals as strings.
vyasr Mar 24, 2021
babcdfc
Reject all extension types as string types.
vyasr Mar 25, 2021
2b00611
Create separate lists for extension type methods.
vyasr Mar 25, 2021
76ab556
Merge branch 'branch-0.19' into fix/issue7687_part2
vyasr Mar 25, 2021
1ebde51
Enable collect for decimals.
vyasr Mar 25, 2021
4c5d876
Enable argmin and argmax.
vyasr Mar 25, 2021
4134e43
Fix variance key name.
vyasr Mar 25, 2021
43cf580
Move groupby aggregation list to groupby.py and clean up the assignme…
vyasr Mar 25, 2021
474a179
Disable aggs that are overrides of actual methods.
vyasr Mar 25, 2021
25e74ef
Move more logic out of the GroupBy class.
vyasr Mar 25, 2021
8a44827
Simplify getattr usage.
vyasr Mar 25, 2021
6b5c67f
Clearly documented unknown failures.
vyasr Mar 25, 2021
8e45ad0
Match other class groupbys to strings.
vyasr Mar 25, 2021
81ffe0a
Fix style and remove unsupported operations.
vyasr Mar 25, 2021
6d3fad3
Apply black reformattings.
vyasr Mar 25, 2021
714742d
Remove variance from obviously unsupported types.
vyasr Mar 25, 2021
ea4ed2e
Defer getattr to getitem if possible.
vyasr Mar 25, 2021
026bb4e
Make getattr safe for copying.
vyasr Mar 25, 2021
1259032
Remove support for aggregating structs.
vyasr Mar 25, 2021
6c61806
Update documented list of groupby operations.
vyasr Mar 25, 2021
a14d30f
Move function out of loop.
vyasr Mar 25, 2021
12caa06
Merge branch 'branch-0.19' into fix/issue7687_part2
vyasr Mar 26, 2021
5c71bfe
Remove redundant test, add test of decimal.
vyasr Mar 27, 2021
25811b0
Fix formatting.
vyasr Mar 27, 2021
1450f2d
Merge branch 'branch-0.19' into fix/issue7687_part2
vyasr Mar 28, 2021
39c45ac
Merge branch 'branch-0.19' into fix/issue7687_part2
vyasr Mar 29, 2021
c036d0c
Add more rigorous test (currently includes debugging statements).
vyasr Mar 29, 2021
d2385ec
Add support for pandas Series composed of decimal.Decimal objects.
vyasr Mar 29, 2021
5395563
Clean up the testing code and use Decimal to make pandas and cudf com…
vyasr Mar 29, 2021
cec2c13
Rewrite test logic to avoid duplicates, but remove those tests for id…
vyasr Mar 29, 2021
eadc028
Minor cleanup.
vyasr Mar 30, 2021
395856a
Apply black.
vyasr Mar 30, 2021
c8a83b3
Don't overwrite dtype variable.
vyasr Mar 30, 2021
040401c
Skip decimal tests on CUDA 10.x.
vyasr Mar 31, 2021
286b686
Rename pyarrow_dtype to pyarrow_type.
vyasr Mar 31, 2021
91642d3
Use rmm to get the CUDA version.
vyasr Mar 31, 2021
2e7dfb8
Make decimal fail loudly on older architectures.
vyasr Mar 31, 2021
fbfcdf8
Fix import order.
vyasr Mar 31, 2021
72bc1b7
Change exception message to indicate that the underlying cause is a c…
vyasr Mar 31, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions python/cudf/cudf/_lib/groupby.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ cimport cudf._lib.cpp.types as libcudf_types
cimport cudf._lib.cpp.groupby as libcudf_groupby
cimport cudf._lib.cpp.aggregation as libcudf_aggregation

import rmm


# The sets below define the possible aggregations that can be performed on
# different dtypes. The uppercased versions of these strings correspond to
Expand Down Expand Up @@ -240,6 +242,10 @@ def _drop_unsupported_aggs(Table values, aggs):
elif (
is_decimal_dtype(values._data[col_name].dtype)
):
if rmm._cuda.gpu.runtimeGetVersion() < 11000:
raise RuntimeError(
"Decimal aggregations are not supported on CUDA 10.x."
vyasr marked this conversation as resolved.
Show resolved Hide resolved
)
for i, agg_name in enumerate(aggs[col_name]):
if Aggregation(agg_name).kind not in _DECIMAL_AGGS:
del result[col_name][i]
Expand Down
15 changes: 7 additions & 8 deletions python/cudf/cudf/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,10 +336,6 @@ def test_groupby_2keys_agg(nelem, func):
assert_eq(got_df, expect_df, check_dtype=check_dtype)


@pytest.mark.skipif(
rmm._cuda.gpu.runtimeGetVersion() < 11000,
reason="These aggregations are not supported on CUDA 10.x.",
)
@pytest.mark.parametrize("num_groups", [2, 3, 10, 50, 100])
@pytest.mark.parametrize("nelem_per_group", [1, 10, 100])
@pytest.mark.parametrize(
Expand Down Expand Up @@ -393,10 +389,13 @@ def test_groupby_agg_decimal(num_groups, nelem_per_group, func):
)

expect_df = pdf.groupby("idx", sort=True).agg(func)
got_df = gdf.groupby("idx", sort=True).agg(func)

assert_eq(expect_df["x"], got_df["x"], check_dtype=False)
assert_eq(expect_df["y"], got_df["y"], check_dtype=False)
if rmm._cuda.gpu.runtimeGetVersion() < 11000:
with pytest.raises(RuntimeError):
got_df = gdf.groupby("idx", sort=True).agg(func)
else:
got_df = gdf.groupby("idx", sort=True).agg(func)
assert_eq(expect_df["x"], got_df["x"], check_dtype=False)
assert_eq(expect_df["y"], got_df["y"], check_dtype=False)


@pytest.mark.parametrize(
Expand Down