Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement lower/upper_bound for struct-typed columns #7690

Closed
gerashegalov opened this issue Mar 23, 2021 · 0 comments · Fixed by #7865
Closed

[FEA] Implement lower/upper_bound for struct-typed columns #7690

gerashegalov opened this issue Mar 23, 2021 · 0 comments · Fixed by #7865
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@gerashegalov
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Distributed sort in Spark-RAPIDS requires Range Partitioning

Describe the solution you'd like
Range Partitioning is implemented utilizing lower/upper_bound calls. We would like bound be working for struct columns.

Describe alternatives you've considered
For datasets fitting on 1 GPU, the number of shuffle partitions can be set to 1 in Spark which bypasses calls to lower/upper_bounds. This is not generally applicable.

Additional context
See cuDF PR #7422 and NVIDIA/spark-rapids#1883

@gerashegalov gerashegalov added feature request New feature or request Needs Triage Need team to review and classify labels Mar 23, 2021
@revans2 revans2 added the Spark Functionality that helps Spark RAPIDS label Mar 23, 2021
@ttnghia ttnghia self-assigned this Mar 25, 2021
@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 26, 2021
@ttnghia ttnghia linked a pull request Apr 5, 2021 that will close this issue
rapids-bot bot pushed a commit that referenced this issue Apr 19, 2021
This PR add support for `lower_bound` and `upper_bound` binary searchs for structs column. This closes #7690.

In addition to adding binary search for structs, I also did some refactoring for `tests/search/search_test.cpp`, extracting dictionary search test from it. As such, basic search tests, dictionary search tests and (the new) struct search tests are put in separate source files. This is easier to access and future maintainance.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - David Wendt (https://github.com/davidwendt)
  - Keith Kraus (https://github.com/kkraus14)

URL: #7865
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants