-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support GpuConcat on ArrayType #2379
Conversation
Signed-off-by: sperlingxx <[email protected]>
build |
Signed-off-by: sperlingxx <[email protected]>
I found another small problem of cuDF implementation during running tests on Array of String locally. And I've put up a PR to fix it. |
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
private def stringConcat(batch: ColumnarBatch): GpuColumnVector = { | ||
val rows = batch.numRows() | ||
|
||
withResource(ArrayBuffer[ColumnVector]()) { buffer => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here and later: ArrayBuffer.empty[ColumnVector]
for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: sperlingxx <[email protected]>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
…erations.scala Co-authored-by: Jason Lowe <[email protected]>
…erations.scala Co-authored-by: Jason Lowe <[email protected]>
build |
…erations.scala Co-authored-by: Jason Lowe <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits but there are some corner cases that string concat missed and that this also is missing.
I tested the string concat code and it works if we have asserts disabled. If we enable them it fails. I don't know about list concat. There is probably also an issue with no columns being passed in. I am fine with letting this slide for now because you have to go out of your way to disable expression folding for it to be an issue, but it might be good to just fall back to the CPU if there are no children. That way we don't have to ever worry about it.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: sperlingxx <[email protected]>
build |
I added missing support of empty concat. And I also found that cuDF concat can be simply bypassed in single column concat. |
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: sperlingxx <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
f.concat(s1, f.col('b')), | ||
f.concat(f.col('a'), s2), | ||
f.concat(f.lit(None).cast('string'), f.col('b')), | ||
f.concat(f.col('a'), f.lit(None).cast('string')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an all nulls test case? I guess this line might hit it if col a can generate nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar for arrays, do we want an all nulls test case for arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be frank, I am not sure whether there exists rows consisting of all null fields among test data for concat_array , with default random seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think the all null case may be not that special, since the output row will be null only if one input field is null.
Closes NVIDIA#2013 Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType. Signed-off-by: sperlingxx <[email protected]>
Closes NVIDIA#2013 Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType. Signed-off-by: sperlingxx <[email protected]>
Closes NVIDIA#2013 Support GpuConcat on ArrayType. And introduce some refinement for GpuConcat on StringType. Signed-off-by: sperlingxx <[email protected]>
Current PR is to support
GpuConcat
on ArrayType, which is required in issue #2013.In addition, this PR also introduces some refinement on the implementation of string concatenation.