-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support max on single-level struct in aggregation context #4434
Support max on single-level struct in aggregation context #4434
Conversation
Signed-off-by: Chong Gao <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me, but a NIT.
@@ -1695,3 +1695,22 @@ def test_groupby_std_variance_partial_replace_fallback(data_gen, | |||
exist_classes=','.join(exist_clz), | |||
non_exist_classes=','.join(non_exist_clz), | |||
conf=local_conf) | |||
|
|||
@ignore_order | |||
@pytest.mark.parametrize('data_type', all_gen + [NullGen()], ids=idfn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize('data_type', all_gen + [NullGen()], ids=idfn) | |
@pytest.mark.parametrize('data_gen', all_gen + [NullGen()], ids=idfn) |
Signed-off-by: Chong Gao <[email protected]>
build |
1 similar comment
build |
docs/supported_ops.md
Outdated
@@ -15227,7 +15227,7 @@ are limited. | |||
<td><b>NS</b></td> | |||
<td><b>NS</b></td> | |||
<td> </td> | |||
<td><b>NS</b></td> | |||
<td><em>PS<br/>UTC is only supported TZ for child TIMESTAMP;<br/>unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT</em></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says that it works for window operations too, but I see no tests for window operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, removed the window support.
('a', StructGen([ | ||
('aa', data_gen), | ||
('ab', data_gen)])), | ||
('b', IntegerGen())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it likely that b will have repeated data in it? That way the groupings are likely to have more than one thing in them to compare?
('b', RepeatSeqGen(IntegerGen(), length=20))]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suported both Min and Max.
Removed the window support.
Updated test case.
Seems CUDF has a bug: rapidsai/cudf#8974 (comment). After confired, I'll file an issue.
build |
Suggest wait, @ttnghia is working on the fix of rapidsai/cudf#8974 (comment) |
Signed-off-by: Chong Gao <[email protected]>
…tion/groupby (#10026) This is another fix for NVIDIA/spark-rapids#4434, when the null order is wrongly handled if the input structs column does not have nulls at the top level but only has null at the children levels. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - MithunR (https://github.com/mythrocks) URL: #10026
Signed-off-by: Chong Gao <[email protected]>
build |
@firestarman @revans2 help to review, is not blocked now. |
[FEA] Support max on single-level struct in aggregation context #3541
This fixes #3541
Signed-off-by: Chong Gao [email protected]