Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Clean up the code of MutableArrayData #1763

Merged
merged 1 commit into from
May 31, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 28 additions & 56 deletions arrow/src/array/transform/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,18 +78,13 @@ impl<'a> _MutableArrayData<'a> {
}
};

let mut array_data_builder = ArrayDataBuilder::new(self.data_type)
ArrayDataBuilder::new(self.data_type)
.offset(0)
.len(self.len)
.null_count(self.null_count)
.buffers(buffers)
.child_data(child_data);
if self.null_count > 0 {
array_data_builder =
array_data_builder.null_bit_buffer(Some(self.null_buffer.into()));
}

array_data_builder
.child_data(child_data)
.null_bit_buffer((self.null_count > 0).then(|| self.null_buffer.into()))
}
}

Expand Down Expand Up @@ -184,48 +179,23 @@ fn build_extend_dictionary(
max: usize,
) -> Option<Extend> {
use crate::datatypes::*;
macro_rules! validate_and_build {
($dt: ty) => {{
let _: $dt = max.try_into().ok()?;
let offset: $dt = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}};
}
match array.data_type() {
DataType::Dictionary(child_data_type, _) => match child_data_type.as_ref() {
DataType::UInt8 => {
let _: u8 = max.try_into().ok()?;
let offset: u8 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::UInt16 => {
let _: u16 = max.try_into().ok()?;
let offset: u16 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::UInt32 => {
let _: u32 = max.try_into().ok()?;
let offset: u32 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::UInt64 => {
let _: u64 = max.try_into().ok()?;
let offset: u64 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::Int8 => {
let _: i8 = max.try_into().ok()?;
let offset: i8 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::Int16 => {
let _: i16 = max.try_into().ok()?;
let offset: i16 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::Int32 => {
let _: i32 = max.try_into().ok()?;
let offset: i32 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::Int64 => {
let _: i64 = max.try_into().ok()?;
let offset: i64 = offset.try_into().ok()?;
Some(primitive::build_extend_with_offset(array, offset))
}
DataType::UInt8 => validate_and_build!(u8),
DataType::UInt16 => validate_and_build!(u16),
DataType::UInt32 => validate_and_build!(u32),
DataType::UInt64 => validate_and_build!(u64),
DataType::Int8 => validate_and_build!(i8),
DataType::Int16 => validate_and_build!(i16),
DataType::Int32 => validate_and_build!(i32),
DataType::Int64 => validate_and_build!(i64),
_ => unreachable!(),
},
_ => None,
Expand Down Expand Up @@ -394,28 +364,30 @@ impl<'a> MutableArrayData<'a> {
/// a [Capacities] variant is not yet supported.
pub fn with_capacities(
arrays: Vec<&'a ArrayData>,
mut use_nulls: bool,
use_nulls: bool,
capacities: Capacities,
) -> Self {
let data_type = arrays[0].data_type();
use crate::datatypes::*;

// if any of the arrays has nulls, insertions from any array requires setting bits
// as there is at least one array with nulls.
if arrays.iter().any(|array| array.null_count() > 0) {
use_nulls = true;
};
let use_nulls = use_nulls | arrays.iter().any(|array| array.null_count() > 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short circuit here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect, as described above use_nulls is a hint, and should be false if the only source of nulls are the arras themselves.

To explain why, you might have a number of source arrays that don't contain any nulls, but then call extend_nulls.

If it helps I tried to clean this up a bit in #1225 but abandoned it for lack of time - if you wanted to pick it up again...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect, as described above use_nulls is a hint, and should be false if the only source of nulls are the arras themselves.

Thank you @tustvold for your review. I am still a little confused about it.
This is the truth table in my mind:

the input value of use_nulls arrays.any(null_count > 0) final value of use_nulls
true true true
true false true
false true true
false false false

Although I not very clear about the whole logic of the function with_capacities, for use_nulls I think this is the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah yes, was a long day and brain was fried 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah yes, was a long day and brain was fried 😅

Haha, summer is coming!


let mut array_capacity;

let [buffer1, buffer2] = match (data_type, &capacities) {
(DataType::LargeUtf8, Capacities::Binary(capacity, Some(value_cap)))
| (DataType::LargeBinary, Capacities::Binary(capacity, Some(value_cap))) => {
(
DataType::LargeUtf8 | DataType::LargeBinary,
Capacities::Binary(capacity, Some(value_cap)),
) => {
array_capacity = *capacity;
preallocate_offset_and_binary_buffer::<i64>(*capacity, *value_cap)
}
(DataType::Utf8, Capacities::Binary(capacity, Some(value_cap)))
| (DataType::Binary, Capacities::Binary(capacity, Some(value_cap))) => {
(
DataType::Utf8 | DataType::Binary,
Capacities::Binary(capacity, Some(value_cap)),
) => {
array_capacity = *capacity;
preallocate_offset_and_binary_buffer::<i32>(*capacity, *value_cap)
}
Expand Down