Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cudf::partition* APIs that do not return offsets for empty output table #11709

Merged
merged 28 commits into from
Sep 27, 2022

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Sep 14, 2022

By definition, the cudf::partition* API will return a vector of offsets with size is at least the number of partitions. As such, an output empty table should associate with an output offset array like [0, 0, ..., 0] (all zeros). However, currently the output offsets in such situations is an empty array.

This PR corrects the implementation for such corner cases.

Closes #11700.

@ttnghia ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS breaking Breaking change labels Sep 14, 2022
@ttnghia ttnghia requested a review from a team as a code owner September 14, 2022 21:16
@ttnghia ttnghia self-assigned this Sep 14, 2022
@ttnghia ttnghia requested review from vyasr and vuule September 14, 2022 21:16
@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 14, 2022

I've added Breaking label since downstream usages of these API may have been wrongly expecting empty offsets.

@ttnghia ttnghia marked this pull request as draft September 14, 2022 21:20
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia marked this pull request as ready for review September 14, 2022 21:30
@ttnghia ttnghia marked this pull request as draft September 14, 2022 21:37
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia marked this pull request as ready for review September 14, 2022 21:41
@ttnghia ttnghia removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Sep 23, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 24, 2022
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 24, 2022

Rerun tests.

1 similar comment
@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 25, 2022

Rerun tests.

@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 26, 2022

Rerun tests.

@ttnghia ttnghia marked this pull request as ready for review September 26, 2022 17:50
@ttnghia ttnghia requested a review from a team as a code owner September 26, 2022 17:50
@ttnghia ttnghia added breaking Breaking change and removed non-breaking Non-breaking change labels Sep 26, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Sep 27, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 0a430fa into rapidsai:branch-22.10 Sep 27, 2022
@ttnghia ttnghia deleted the fix_partitioning branch September 27, 2022 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Calling partition() on an empty Table leads to a crash
6 participants