-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ASCII fast path for ILIKE scalar (90% faster) #3306
Conversation
5d73cc4
to
160ee74
Compare
160ee74
to
faf6eaa
Compare
FYI @askoa |
arrow-string/src/like.rs
Outdated
} else if right.starts_with('%') && !right[1..].contains(is_like_pattern) { | ||
// fast path, can use ends_with | ||
let ends_str = &right[1..].to_uppercase(); | ||
// If not ASCII faster to use case insensitive regex than allocating using to_uppercase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If not ASCII faster to use case insensitive regex than allocating using to_uppercase | |
// If ASCII faster to use case insensitive regex than allocating using to_uppercase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the existing tests cover all the corner cases?
I can add some more, give me a minute |
I think this might not work for non-ASCII characters that capitalize to ASCII characters 😢 |
Perhaps you can also check that left is entirely ASCII (which is a fairly common case) |
Interestingly the regex crate doesn't appear to handle this correctly... 🤔 |
4597713
to
7b2c61b
Compare
So it turns out the fact we were handling this was actually incorrect, so this is not only faster, but more correct 😆 - #3311 |
Aargh, there are unicode characters that have lowercase ASCII... Guess I will need to check if the string are ASCII, and use the regex if not |
This is currently blocked on #3313, which will allow the kernel to assume a GenericStringArray, which in turn will allow ASCII verification in one pass. |
So even with the extra check to detect is_ascii, this still is a pretty significant speedup |
@@ -1272,6 +1270,101 @@ mod tests { | |||
vec![true, false, false, false] | |||
); | |||
|
|||
// We only implement loose matching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran these tests without the changes in this PR I got a bunch of failures. Is that expected?
failures:
---- like::tests::test_utf8_array_ilike_unicode stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing FFkoß at position 0 to FFkoSS ', arrow-string/src/like.rs:1279:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode
at ./src/like.rs:1279:5
5: arrow_string::like::tests::test_utf8_array_ilike_unicode::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_contains stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_contains' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing sdlkdfFkoßsdfs at position 0 to %FFkoSS% ', arrow-string/src/like.rs:1331:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains
at ./src/like.rs:1331:5
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_contains_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_contains_dyn' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing sdlkdfFkoßsdfs at position 0 to %FFkoSS% ', arrow-string/src/like.rs:1331:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains_dyn
at ./src/like.rs:1331:5
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains_dyn::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_dyn' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing FFkoß at position 0 to FFkoSS ', arrow-string/src/like.rs:1279:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_dyn
at ./src/like.rs:1279:5
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_dyn::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_ends stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_ends' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing sdlkdfFFkoß at position 0 to %FFkoSS ', arrow-string/src/like.rs:1311:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends
at ./src/like.rs:924:21
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_ends_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_ends_dyn' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing sdlkdfFFkoß at position 0 to %FFkoSS ', arrow-string/src/like.rs:1311:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends_dyn
at ./src/like.rs:924:21
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends_dyn::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_starts stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_starts' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing FFkoßsdlkdf at position 0 to FFkoSS% ', arrow-string/src/like.rs:1291:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_starts
at ./src/like.rs:1291:5
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_starts::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
---- like::tests::test_utf8_array_ilike_unicode_start_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_start_dyn' panicked at 'assertion failed: `(left == right)`
left: `true`,
right: `false`: unexpected result when comparing FFkoßsdlkdf at position 0 to FFkoSS% ', arrow-string/src/like.rs:1291:5
stack backtrace:
0: rust_begin_unwind
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
3: core::panicking::assert_failed
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
4: arrow_string::like::tests::test_utf8_array_ilike_unicode_start_dyn
at ./src/like.rs:1291:5
5: arrow_string::like::tests::test_utf8_array_ilike_unicode_start_dyn::{{closure}}
at ./src/like.rs:917:13
6: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
like::tests::test_utf8_array_ilike_unicode
like::tests::test_utf8_array_ilike_unicode_contains
like::tests::test_utf8_array_ilike_unicode_contains_dyn
like::tests::test_utf8_array_ilike_unicode_dyn
like::tests::test_utf8_array_ilike_unicode_ends
like::tests::test_utf8_array_ilike_unicode_ends_dyn
like::tests::test_utf8_array_ilike_unicode_start_dyn
like::tests::test_utf8_array_ilike_unicode_starts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes see #3311 the previous special case logic was incorrect as it changed the semantics from the regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- cc @Dandandan
Which issue does this PR close?
Closes #3311
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?