-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: binary_mut should work if only one input array has null buffer #6396
Merged
Merged
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -313,8 +313,6 @@ where | |
)))); | ||
} | ||
|
||
let nulls = NullBuffer::union(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()); | ||
|
||
let mut builder = a.into_builder()?; | ||
|
||
builder | ||
|
@@ -323,7 +321,12 @@ where | |
.zip(b.values()) | ||
.for_each(|(l, r)| *l = op(*l, *r)); | ||
|
||
let array_builder = builder.finish().into_data().into_builder().nulls(nulls); | ||
let array = builder.finish(); | ||
|
||
// The builder has the null buffer from `a`, it is not changed. | ||
let nulls = NullBuffer::union(array.logical_nulls().as_ref(), b.logical_nulls().as_ref()); | ||
|
||
let array_builder = array.into_data().into_builder().nulls(nulls); | ||
|
||
let array_data = unsafe { array_builder.build_unchecked() }; | ||
Ok(Ok(PrimitiveArray::<T>::from(array_data))) | ||
|
@@ -413,7 +416,8 @@ where | |
try_binary_no_nulls_mut(len, a, b, op) | ||
} else { | ||
let nulls = | ||
NullBuffer::union(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()).unwrap(); | ||
create_union_null_buffer(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()) | ||
.unwrap(); | ||
|
||
let mut builder = a.into_builder()?; | ||
|
||
|
@@ -435,6 +439,22 @@ where | |
} | ||
} | ||
|
||
/// Computes the union of the nulls in two optional [`NullBuffer`] which | ||
/// is not shared with the input buffers. | ||
/// | ||
/// The union of the nulls is the same as `NullBuffer::union(lhs, rhs)` but | ||
/// it does not increase the reference count of the null buffer. | ||
fn create_union_null_buffer( | ||
lhs: Option<&NullBuffer>, | ||
rhs: Option<&NullBuffer>, | ||
) -> Option<NullBuffer> { | ||
match (lhs, rhs) { | ||
(Some(lhs), Some(rhs)) => Some(NullBuffer::new(lhs.inner() & rhs.inner())), | ||
(Some(n), None) | (None, Some(n)) => Some(NullBuffer::new(n.inner() & n.inner())), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand this correctly, it forces a copy of the null buffer's contents? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Previously it did a clone that shares the underlying buffer. |
||
(None, None) => None, | ||
} | ||
} | ||
|
||
/// This intentional inline(never) attribute helps LLVM optimize the loop. | ||
#[inline(never)] | ||
fn try_binary_no_nulls<A: ArrayAccessor, B: ArrayAccessor, F, O>( | ||
|
@@ -557,6 +577,24 @@ mod tests { | |
assert_eq!(c, expected); | ||
} | ||
|
||
#[test] | ||
fn test_binary_mut_null_buffer() { | ||
let a = Int32Array::from(vec![Some(3), Some(4), Some(5), Some(6), None]); | ||
|
||
let b = Int32Array::from(vec![Some(10), Some(11), Some(12), Some(13), Some(14)]); | ||
|
||
let r1 = binary_mut(a, &b, |a, b| a + b).unwrap(); | ||
|
||
let a = Int32Array::from(vec![Some(3), Some(4), Some(5), Some(6), None]); | ||
let b = Int32Array::new( | ||
vec![10, 11, 12, 13, 14].into(), | ||
Some(vec![true, true, true, true, true].into()), | ||
); | ||
|
||
let r2 = binary_mut(a, &b, |a, b| a + b).unwrap(); | ||
viirya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
assert_eq!(r1.unwrap(), r2.unwrap()); | ||
} | ||
|
||
#[test] | ||
fn test_try_binary_mut() { | ||
let a = Int32Array::from(vec![15, 14, 9, 8, 1]); | ||
|
@@ -587,6 +625,24 @@ mod tests { | |
.expect_err("should got error"); | ||
} | ||
|
||
#[test] | ||
fn test_try_binary_mut_null_buffer() { | ||
let a = Int32Array::from(vec![Some(3), Some(4), Some(5), Some(6), None]); | ||
|
||
let b = Int32Array::from(vec![Some(10), Some(11), Some(12), Some(13), Some(14)]); | ||
|
||
let r1 = try_binary_mut(a, &b, |a, b| Ok(a + b)).unwrap(); | ||
|
||
let a = Int32Array::from(vec![Some(3), Some(4), Some(5), Some(6), None]); | ||
let b = Int32Array::new( | ||
vec![10, 11, 12, 13, 14].into(), | ||
Some(vec![true, true, true, true, true].into()), | ||
); | ||
|
||
let r2 = try_binary_mut(a, &b, |a, b| Ok(a + b)).unwrap(); | ||
viirya marked this conversation as resolved.
Show resolved
Hide resolved
|
||
assert_eq!(r1.unwrap(), r2.unwrap()); | ||
} | ||
|
||
#[test] | ||
fn test_unary_dict_mut() { | ||
let values = Int32Array::from(vec![Some(10), Some(20), None]); | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only
a
has null buffer,NullBuffer.union
will clone it which shares the underlying buffer that failsinto_mutable
call on the null buffer later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have
a
by value, perhaps we could add aPrimitiveArray::into_parts
that would deconstruct thePrimitiveArray
into its underlyingNullBuffer
and values 🤔Following the model of https://docs.rs/arrow/latest/arrow/array/struct.GenericByteArray.html#method.into_parts
That way we could directly use the
NullBuffer
and avoid the copy increate_union_null_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me see if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, for
binary_mut
, it is okay to defer the computation of union of two null buffers. It is because that we don't need to own the null buffers before invokingop
on the two arrays.Since the null buffer is not changed on the builder, we can compute the union after finishing the builder. So for
binary_mut
, we can avoid copying null buffer.But for
try_binary_mut
, because we need to get the union of two null buffers before invokingop
on the two arrays, we still need to copy it as this PR does.PrimitiveArray::into_parts
(it already exists) also cannot help on it because the builder needs to own the null buffer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I only avoid copying null buffer in
binary_mut
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels to me that we could remove the copy on try_binary_mut, but this seems like an improvement to me 🚀