Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add binary support in arrow-string #6926

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rluvaton
Copy link
Contributor

@rluvaton rluvaton commented Dec 31, 2024

(ignore branch name)

Which issue does this PR close?

Closes #6923

What changes are included in this PR?

  1. Made PredicateImpl trait to work with the predicate regardless of string or binary
  2. move implementation to use the Predicate and make it more generic
  3. implement the PredicateImpl for the old Predicate and the new BinaryPredicate using macro (I don't really like this as it seem less maintainable, but not sure what's better, duplicating or macro, or another approach)

Are there any user-facing changes?

Yes, allow users to pass binary arrays to like/starts with/contains and more

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 31, 2024
@rluvaton rluvaton marked this pull request as ready for review December 31, 2024 19:39
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry in the delay reviewing this PR -- it is hard to find time reviewing such a large PR

I wonder what the usecase is for using LIKE on binary data? I as because it seems to me that LIKE is mostly useful for character strings.

I can see the usecase for starts_with / ends_with and contains for binary data,

Perhaps instead of trying to inject binary array into the code for handling strings, we could simply have simpler prefix/suffix matching for binary -- it might have some more repetition but would be simpler to understand any avoid any potential performance issues related to this code 🤔

@@ -59,6 +59,16 @@ pub struct FixedSizeBinaryArray {
}

impl FixedSizeBinaryArray {
/// Returns true if all data within this array is ASCII
pub fn is_ascii(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need to check a binary array for ASCII -- there shouldn't be any optimizations that rely on the data being ASCII

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arrow-string function should support binary input as well
2 participants