Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly disable groupby on unsupported key types. #9227

Merged
merged 7 commits into from
Sep 20, 2021

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Sep 14, 2021

Fixes #8905.

Attempting groupby aggregations with LIST keys leads to silent
failures and bad results.
For instance, attempting hash-based groupby aggregations with LIST
keys only fails on DEBUG builds, thus:

/home/myth/dev/cudf/2/cpp/include/cudf/table/row_operators.cuh:447: unsigned int cudf:
:element_hasher_with_seed<hash_function, has_nulls>::operator()(cudf::column_device_view, signed in
t) const [with T = cudf::list_view; void *<anonymous> = (void *)nullptr; hash_function = default_ha
sh; __nv_bool has_nulls = false]: block: [0,0,0], thread: [0,0,0] Assertion `false && "Unsupported
type in hash."` failed.

In RELEASE builds, a copy of the input LIST column is returned, causing
each output row to be interpreted as its own group.

This commit adds an explicit failure for unsupported groupby key types,
i.e. those that don't support equality comparisons (like LIST).

Fixes rapidsai#8905.

Attempting groupby aggregations with LIST keys leads to silent
failures and bad results.
For instance, attempting hash-based groupby aggregations with LIST
keys only fails on DEBUG builds, thus:
```
/home/myth/dev/cudf/2/cpp/include/cudf/table/row_operators.cuh:447: unsigned int cudf:
:element_hasher_with_seed<hash_function, has_nulls>::operator()(cudf::column_device_view, signed in
t) const [with T = cudf::list_view; void *<anonymous> = (void *)nullptr; hash_function = default_ha
sh; __nv_bool has_nulls = false]: block: [0,0,0], thread: [0,0,0] Assertion `false && "Unsupported
type in hash."` failed.
```
In RELEASE builds, a copy of the input LIST column is returned, causing
each output row to be interpreted as its own group.

This commit adds an explicit failure for unsupported LIST groupby keys.
@mythrocks mythrocks requested a review from a team as a code owner September 14, 2021 04:34
@mythrocks mythrocks self-assigned this Sep 14, 2021
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Sep 14, 2021
@mythrocks mythrocks added 3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Sep 14, 2021
@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@eab2486). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 4afcb9d differs from pull request most recent head 0dd72f7. Consider uploading reports for the commit 0dd72f7 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9227   +/-   ##
===============================================
  Coverage                ?   10.84%           
===============================================
  Files                   ?      116           
  Lines                   ?    19171           
  Branches                ?        0           
===============================================
  Hits                    ?     2080           
  Misses                  ?    17091           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eab2486...0dd72f7. Read the comment docs.

@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Sep 15, 2021
@mythrocks mythrocks changed the title Explicitly disable groupby on LIST keys. Explicitly disable groupby on unsupported key types. Sep 15, 2021
@mythrocks mythrocks requested a review from jrhemstad September 15, 2021 18:08
Also, added exceptions for STRING and STRUCT types as keys.
@mythrocks mythrocks requested review from jrhemstad and removed request for cwharris September 16, 2021 17:25
@mythrocks
Copy link
Contributor Author

Taking @cwharris off the hook. Thanks for taking up the review in his stead, @ttnghia.

@mythrocks
Copy link
Contributor Author

The failures seem transient:

02:25:26   fatal: unable to access 'https://github.com/dask/distributed.git/': The requested URL returned error: 429
02:25:26 WARNING: Discarding git+https://github.com/dask/distributed.git@main. Command errored out with exit status 128: git clone -q https://github.com/dask/distributed.git /tmp/pip-req-build-m_6j6ejd Check the logs for full command output.

@mythrocks
Copy link
Contributor Author

Rerun tests

1. No special handling for STRUCT, STRING.
2. Utility function for type checks.
@mythrocks mythrocks requested a review from a team as a code owner September 17, 2021 03:53
@mythrocks mythrocks requested a review from jrhemstad September 17, 2021 03:54
Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM

@mythrocks
Copy link
Contributor Author

Thanks for the reviews, all. I'll merge this in now.

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 625810a into rapidsai:branch-21.10 Sep 20, 2021
@mythrocks mythrocks deleted the disable-groupby-lists branch September 20, 2021 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] groupby on STRUCT / LIST input passes silently, returns incorrect grouping.
4 participants