Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nested structs in rank and dense rank #8962

Merged
merged 6 commits into from
Sep 14, 2021

Conversation

rwlee
Copy link
Contributor

@rwlee rwlee commented Aug 5, 2021

Follow on to #8652 for nested struct support using, partially removing the need for #8683.

This change simplifies the rank algorithm by assuming superimpose_parent_nulls has been ran on the struct column. This removes the need for separate logic that ensures we are not comparing elements covered by a parent column's null mask.

@rwlee rwlee added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 5, 2021
@rwlee rwlee requested a review from a team as a code owner August 5, 2021 01:49
@rwlee rwlee requested review from vyasr and nvdbaranec August 5, 2021 01:49
@revans2 revans2 linked an issue Aug 5, 2021 that may be closed by this pull request
@rwlee rwlee linked an issue Aug 5, 2021 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Aug 5, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@c6ddd46). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #8962   +/-   ##
===============================================
  Coverage                ?   10.82%           
===============================================
  Files                   ?      115           
  Lines                   ?    19166           
  Branches                ?        0           
===============================================
  Hits                    ?     2074           
  Misses                  ?    17092           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6ddd46...eb5af8a. Read the comment docs.

@rwlee
Copy link
Contributor Author

rwlee commented Aug 6, 2021

Moving back to draft because of parent null inheritance assumption concerns. Will rework around changes to the flattening approach and null checking process.

@rwlee rwlee marked this pull request as draft August 6, 2021 22:11
@karthikeyann karthikeyann added 2 - In Progress Currently a work in progress and removed 3 - Ready for Review Ready for review by team labels Aug 30, 2021
@rwlee rwlee force-pushed the rwlee/flatten_imposed branch from 52ca8ef to b3508ce Compare September 2, 2021 22:59
@rwlee rwlee marked this pull request as ready for review September 2, 2021 23:01
@rwlee rwlee added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 2, 2021
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this cleaned up and boiled down to basically the same pattern for all of these.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it is much cleaner now that you've pushed the null handling down to the comparator. Would it be possible to change generate_ranks and generate_dense_ranks into a single internal function that accepts a device lambda for the tabulate and a predicate for the inclusive_scan_by_key? I believe that the rest of the code is entirely identical now.

cpp/src/structs/utilities.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added the CMake CMake build issue label Sep 9, 2021
cpp/src/groupby/sort/group_rank_scan.cu Show resolved Hide resolved
cpp/src/reductions/scan/scan_inclusive.cu Show resolved Hide resolved
cpp/src/structs/utilities.cpp Outdated Show resolved Hide resolved
@rwlee rwlee requested a review from a team as a code owner September 9, 2021 21:46
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great. Nice to see that the templating made this a negative LOC PR.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmake approval

@harrism
Copy link
Member

harrism commented Sep 14, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit eae76cf into rapidsai:branch-21.10 Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
6 participants