-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework on substring index #11149
Rework on substring index #11149
Conversation
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Depends on NVIDIA/spark-rapids-jni#2205 |
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to have some new integration tests for new covered cases so we can test them on all spark versions.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala
Outdated
Show resolved
Hide resolved
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala
Outdated
Show resolved
Hide resolved
...rc/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsStringExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: fejiang <[email protected]>
build |
Signed-off-by: fejiang <[email protected]>
build |
…into reworkonsubstringindex
Signed-off-by: fejiang <[email protected]>
Signed-off-by: fejiang <[email protected]>
build |
1 similar comment
build |
build |
...rc/test/spark330/scala/org/apache/spark/sql/rapids/suites/RapidsStringExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: fejiang <[email protected]>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: fejiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance is much much better and is super fast compared to the CPU.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala
Show resolved
Hide resolved
Signed-off-by: fejiang <[email protected]>
build |
1 similar comment
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in my comment, docs need to be regenerated. Premerge checks are failing because of this. You can run mvn verify -DskipTests
to regenerate the docs and then commit the changes.
[2024-07-24T13:11:36.608Z] found modified files during mvn verify:
[2024-07-24T13:11:36.609Z] M docs/supported_ops.md
Signed-off-by: fejiang <[email protected]>
build |
Depends on NVIDIA/spark-rapids-jni#2205
This PR is for closing issue #8750, by replace the regex implementation underneath with
cudf::strings::slice_strings()
withcudf::strings::find()/rfind()
, new implementation achieved 12X speed up comparing to the old version NVIDIA/spark-rapids-jni#2205 (comment)