-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally nullify out-of-bounds indices in segmented_gather(). #9318
Optionally nullify out-of-bounds indices in segmented_gather(). #9318
Conversation
The behaviour of `cudf::lists::segmented_gather()` is currently undefined for any index value `i` that falls outside the range `[-n, n)`, where `n` is the number of elements in the list row. This commit adds support to explicitly specify an `out_of_bounds_policy`, like in `cudf::gather()`. The erstwhile behaviour is retained when the bounds policy is set to `DONT_CHECK`. If the bounds policy is specified as `NULLIFY`, then for any index falling outside the range `[-n, n)`, the list element is set to `null`. E.g. ```c++ auto source_column = [{"a", "b", "c", "d"}, {"1", "2", "3", "4"}, {"x", "y", "z"}]; auto gather_map = [{0, -1, 4, -5}, {1, 3, 5}, {}]; auto result = segmented_gather(source_column, gather_map, NULLIFY); result == [{"a", "d", null, null}, {"2", "4", null}, {}]; ```
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9318 +/- ##
===============================================
Coverage ? 10.77%
===============================================
Files ? 116
Lines ? 19360
Branches ? 0
===============================================
Hits ? 2087
Misses ? 17273
Partials ? 0 Continue to review full report at Codecov.
|
I'll investigate the Java failure shortly. I'm so glad the JNI builds are now integrated into CI. |
This is baffling. Here are the failing JNI tests, as per CI logs:
I cannot reproduce the failures locally:
I'll kick the tests off again, after rectifying the modulo calculation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Also, corrected test cases to accommodate.
Rerun tests |
This is interesting. |
@codereport, that is indeed interesting. I didn't run into compilation issues with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nit in the tests: Please don't leave too many blank lines as that makes the code looks discrete.
The whitespace in the old code hasn't been changed. The new code follows that preexisting format. |
@ttnghia: I refrained from reformatting the existing test code on the first pass. Per your suggestion, I have reformatted most of the tests. Let me know if this works better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ttnghia: I refrained from reformatting the existing test code on the first pass. Per your suggestion, I have reformatted most of the tests. Let me know if this works better.
Absolutely it's better. Thanks Mithun.
@gpucibot merge |
Thank you all for the reviews and advice. This change has now been merged. |
The behaviour of
cudf::lists::segmented_gather()
is currently undefined for anyindex value
i
that falls outside the range[-n, n)
, wheren
is the number ofelements in the list row.
This commit adds support to explicitly specify an
out_of_bounds_policy
, like incudf::gather()
. The erstwhile behaviour is retained when the bounds policy is setto
DONT_CHECK
. If the bounds policy is specified asNULLIFY
, then for anyindex falling outside the range
[-n, n)
, the list element is set tonull
.E.g.