-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return empty result for segmented_reduce if input and offsets are both empty #17437
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the early exit is not taken here, we probably need at least two elements in the offsets for the segments to be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is a specific test for this case so I'm going to leave this line unchanged.
https://github.com/rapidsai/cudf/pull/17437/files#diff-274951d8b19a7c6bf16db78ac17d129ba021bc5135f3d3acfbb1bc814b37ee14R1046
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we did this? It seems like it would be safer and that existing test would still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior requires a non-empty result when the input is empty and the offsets are not empty. This appears to be from a request from Spark (reference: #10556 (comment)).
It seems like a non-empty input with more than one offset is reasonable though may not be necessary to check explicitly. I can look into this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course. It is possible to have empty values and non-empty offsets. Sorry, that was an oversight on my part.
Thanks for considering the non-empty input requiring >=2 offsets. It would be nice to make this stricter if we can make a good case for its correctness, but I don’t mean to block on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess. It seems like it should be an invalid input but I don’t want us to go in circles here. A test exists so let’s maintain the status quo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the correct behaviour would be to return one element, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is only one offset, it cannot represent a range. Normally we have N+1 offsets to represent N segments because the
i
-th offset is the (inclusive) start of rangei
and the (exclusive) end of rangei-1
.If we have 1 offset we have at most zero segments, so the test returns an empty result. My claim is that this may be invalid input, because 1 offset cannot define a range, but the user somehow got a single offset. The legal inputs under my previous proposal (which I no longer want to push for) would be zero/empty offsets for zero segments, or >=2 offsets for representing >=1 segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you are referring to. I believe the testcases cover the expected outcomes. This PR only changes the specific behavior where both the input and offsets are empty and returns an empty result instead of throwing an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This certainly would be a breaking change we could consider in a separate PR.