-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strings 'like' function #11558
Add strings 'like' function #11558
Conversation
This was benchmarked against The speedup is an x-factor which ranged between 2x to ~12x. |
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #11558 +/- ##
===============================================
Coverage ? 86.41%
===============================================
Files ? 145
Lines ? 22992
Branches ? 0
===============================================
Hits ? 19869
Misses ? 3123
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I have a couple small issues with naming, otherwise approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @davidwendt!
@gpucibot merge |
[#11558](#11558) added strings `like` function to cudf, which is a wildcard-based string matching function based on SQL's LIKE statement. We add `like` jni and native method calling the `like` function in #11558 and corresponding Java unit tests. This is part of the solution for issue [NVIDIA/spark-rapids#6430](NVIDIA/spark-rapids#6430). Authors: - Yuan Jiang (https://github.com/cindyyuanjiang) Approvers: - Nghia Truong (https://github.com/ttnghia) - Gera Shegalov (https://github.com/gerashegalov) - Jason Lowe (https://github.com/jlowe) URL: #12032
Description
Adds new strings
like
function to cudf. This is a wildcard-based string matching function based on SQL's LIKE statement.https://www.sqltutorial.org/sql-like/
Though some SQL implementations provide regex-like capabilities in the
like
statement pattern, the implementation here is strictly limited to the%
(multi-character placeholder) and the_
(single character placeholder) behavior. It also accepts an optional escape character that can be used when trying to match strings that contain%
or_
in them.This is an easier (and faster) alternative to using the regex based
contains
function.Example usage:
This PR includes gtests, pytest, and an nvbench-mark.
Reference #10797
Checklist