-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support executing collect_list on GPU with windowing. #1548
Conversation
2d07f99
to
9791cfe
Compare
Mark as draft since it depends on the support from cuDF native. |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
aee4fc6
to
86b80e3
Compare
rebased |
build |
1 similar comment
build |
Signed-off-by: Firestarman <[email protected]>
Let collect_list with Windowing supports type of array of struct. Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Add integration tests for collect_list with windowing. Signed-off-by: Firestarman <[email protected]>
071910e
to
bfef795
Compare
rebased |
Signed-off-by: Firestarman <[email protected]>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/TypeChecks.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Firestarman <[email protected]>
cudf nightly with PR rapidsai/cudf#7189 always fail, so the pre-merge check will fail too. |
Signed-off-by: Firestarman <[email protected]>
The fix is in review rapidsai/cudf#7266 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I ma really just down to nits at this point.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
TypeSig.ARRAY.nested(TypeSig.commonCudfTypes + TypeSig.DECIMAL + TypeSig.STRUCT), | ||
TypeSig.ARRAY.nested(TypeSig.all), | ||
Seq(ParamCheck("input", | ||
(TypeSig.commonCudfTypes + TypeSig.DECIMAL).nested() + TypeSig.STRUCT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again I think longer is clearer, but in this case because there is only one nested type I think it is okay. So we don't support doing a collect list on a struct that also contains a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be supported. If it's not working, it might be a bug in libcudf
. I'd be happy to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it, and the struct of struct is not covered in integration tests, so do not add it here. Shall we need it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if yes, i will add it in a following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Show resolved
Hide resolved
cudf side is ready now |
Signed-off-by: Firestarman <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mythrocks are you OK with a follow on PR to add in struct or struct support?
Sounds good. |
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
…IDIA#1548) Signed-off-by: spark-rapids automation <[email protected]>
This PR is to support running
collect_list
on GPU with windowing.This PR depends on the PR rapidsai/cudf#7189 .
Signed-off-by: Firestarman [email protected]