-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support the length
kernel on Binary Array
#1465
Support the length
kernel on Binary Array
#1465
Conversation
rewrite unary_offset using macro Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #1465 +/- ##
==========================================
+ Coverage 82.70% 82.71% +0.01%
==========================================
Files 187 187
Lines 54169 54255 +86
==========================================
+ Hits 44801 44878 +77
- Misses 9368 9377 +9
Continue to review full report at Codecov.
|
@alamb Could you please help to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nicely done @HaoYang670 🏅
Thank you and sorry for the review delay
|
||
#[test] | ||
fn length_test_large_binary() -> Result<()> { | ||
let value: Vec<&[u8]> = vec![b"zero", &[0xff, 0xf8], b"two"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for non UTF8
arrow/src/compute/kernels/length.rs
Outdated
@@ -94,18 +111,20 @@ where | |||
.downcast_ref::<GenericStringArray<O>>() | |||
.unwrap(); | |||
let bits_in_bytes = O::from_usize(8).unwrap(); | |||
unary_offsets_string::<O, _>(array, T::DATA_TYPE, |x| x * bits_in_bytes) | |||
unary_offsets!(array, T::DATA_TYPE, |x| x * bits_in_bytes) | |||
} | |||
|
|||
/// Returns an array of Int32/Int64 denoting the number of bytes in each string in the array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns an array of Int32/Int64 denoting the number of bytes in each string in the array. | |
/// Returns an array of Int32/Int64 denoting the number of bytes in each string/binary in the array. |
arrow/src/compute/kernels/length.rs
Outdated
@@ -115,13 +134,15 @@ pub fn length(array: &dyn Array) -> Result<ArrayRef> { | |||
|
|||
/// Returns an array of Int32/Int64 denoting the number of bits in each string in the array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you. lgtm
simplify the way to get offsets. No performance penalty Signed-off-by: remzi <[email protected]>
// this is a 30% improvement over iterating over u8s and building OffsetSize, which | ||
// justifies the usage of `unsafe`. | ||
let slice: &[O] = &unsafe { offsets.typed_data::<O>() }[$array.offset()..]; | ||
let slice = $array.value_offsets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ that is much nicer
Thanks @HaoYang670 and @viirya |
Which issue does this PR close?
Closes #1464.
Rationale for this change
The
length
kernel can work withBinaryArray
now!.What changes are included in this PR?
length
andbit_length
functions for Binary Arrayunary_offsets
using macroMicro benchmark
No obvious performance penalty.
Are there any user-facing changes?
Some private functions' APIs are changed.