-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GroupColumn
support for StringView
/ ByteView
(faster grouping performance)
#12809
Changes from 5 commits
ca033e0
ffcc1a2
4842965
66bb7be
ef1efce
152a8b1
d61c3ec
151377e
63e11cb
15d8349
d9ee724
beffa35
46822f9
3a93584
f99f55c
37b4816
d78c68d
7cb7dfc
e6c7e7e
36d556e
f76c376
1fd926f
34918cb
8348024
023ed64
c4d45c7
a1f8d0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,22 +15,33 @@ | |
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
use arrow::array::make_view; | ||
use arrow::array::BufferBuilder; | ||
use arrow::array::ByteView; | ||
use arrow::array::GenericBinaryArray; | ||
use arrow::array::GenericStringArray; | ||
use arrow::array::OffsetSizeTrait; | ||
use arrow::array::PrimitiveArray; | ||
use arrow::array::PrimitiveBuilder; | ||
use arrow::array::StringBuilder; | ||
use arrow::array::StringViewBuilder; | ||
use arrow::array::{Array, ArrayRef, ArrowPrimitiveType, AsArray}; | ||
use arrow::buffer::OffsetBuffer; | ||
use arrow::buffer::ScalarBuffer; | ||
use arrow::datatypes::ByteArrayType; | ||
use arrow::datatypes::ByteViewType; | ||
use arrow::datatypes::DataType; | ||
use arrow::datatypes::GenericBinaryType; | ||
use arrow_array::BinaryViewArray; | ||
use arrow_array::GenericByteViewArray; | ||
use arrow_array::StringViewArray; | ||
use arrow_buffer::Buffer; | ||
use datafusion_common::utils::proxy::VecAllocExt; | ||
|
||
use crate::aggregates::group_values::null_builder::MaybeNullBufferBuilder; | ||
use arrow_array::types::GenericStringType; | ||
use datafusion_physical_expr_common::binary_map::{OutputType, INITIAL_BUFFER_CAPACITY}; | ||
use std::mem; | ||
use std::sync::Arc; | ||
use std::vec; | ||
|
||
|
@@ -376,6 +387,176 @@ where | |
} | ||
} | ||
|
||
/// An implementation of [`GroupColumn`] for binary view and utf8 view types. | ||
/// | ||
/// Stores a collection of binary view or utf8 view group values in a buffer | ||
/// whose structure is similar to `GenericByteViewArray`, and we can get benefits: | ||
/// | ||
/// 1. Efficient comparison of incoming rows to existing rows | ||
/// 2. Efficient construction of the final output array | ||
/// 3. Efficient to perform `take_n` comparing to use `GenericByteViewBuilder` | ||
pub struct ByteGroupValueViewBuilder { | ||
output_type: OutputType, | ||
|
||
/// The views of string values | ||
/// | ||
/// If string len <= 12, the view's format will be: | ||
/// string(12B) | len(4B) | ||
/// | ||
/// If string len > 12, its format will be: | ||
/// offset(4B) | buffer_index(4B) | prefix(4B) | len(4B) | ||
views: Vec<u128>, | ||
|
||
/// The progressing block | ||
/// | ||
/// New values will be inserted into it until its capacity | ||
/// is not enough(detail can see `max_block_size`). | ||
in_progress: Vec<u8>, | ||
|
||
/// The completed blocks | ||
completed: Vec<Buffer>, | ||
|
||
/// The max size of `in_progress` | ||
/// | ||
/// `in_progress` will be flushed into `completed`, and create new `in_progress` | ||
/// when found its remaining capacity(`max_block_size` - `len(in_progress)`), | ||
/// is no enough to store the appended value. | ||
max_block_size: usize, | ||
|
||
/// Nulls | ||
nulls: MaybeNullBufferBuilder, | ||
} | ||
|
||
impl ByteGroupValueViewBuilder { | ||
fn append_val_inner<B>(&mut self, array: &ArrayRef, row: usize) | ||
where | ||
B: ByteViewType, | ||
{ | ||
let arr = array.as_byte_view::<B>(); | ||
|
||
// If a null row, set and return | ||
if arr.is_null(row) { | ||
self.nulls.append(true); | ||
self.views.push(0); | ||
return; | ||
} | ||
|
||
// Not null case | ||
self.nulls.append(false); | ||
let value: &[u8] = arr.value(row).as_ref(); | ||
|
||
let value_len = value.len(); | ||
let view = if value_len <= 12 { | ||
make_view(value, 0, 0) | ||
} else { | ||
// Ensure big enough block to hold the value firstly | ||
self.ensure_in_progress_big_enough(value_len); | ||
|
||
// Append value | ||
let buffer_index = self.completed.len(); | ||
let offset = self.in_progress.len(); | ||
self.in_progress.extend_from_slice(value); | ||
|
||
make_view(value, buffer_index as u32, offset as u32) | ||
}; | ||
|
||
// Append view | ||
self.views.push(view); | ||
} | ||
|
||
fn ensure_in_progress_big_enough(&mut self, value_len: usize) { | ||
debug_assert!(value_len > 12); | ||
let require_cap = self.in_progress.len() + value_len; | ||
|
||
// If current block isn't big enough, flush it and create a new in progress block | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe can improve like that |
||
if require_cap > self.max_block_size { | ||
let flushed_block = mem::replace( | ||
&mut self.in_progress, | ||
Vec::with_capacity(self.max_block_size), | ||
); | ||
let buffer = Buffer::from_vec(flushed_block); | ||
self.completed.push(buffer); | ||
} | ||
} | ||
|
||
fn equal_to_inner<B>(&self, lhs_row: usize, array: &ArrayRef, rhs_row: usize) -> bool | ||
where | ||
B: ByteViewType, | ||
{ | ||
let array = array.as_byte_view::<B>(); | ||
|
||
// Check if nulls equal firstly | ||
let exist_null = self.nulls.is_null(lhs_row); | ||
let input_null = array.is_null(rhs_row); | ||
if let Some(result) = nulls_equal_to(exist_null, input_null) { | ||
return result; | ||
} | ||
|
||
// Otherwise, we need to check their values | ||
let exist_view = self.views[lhs_row]; | ||
let exist_view_len = exist_view as u32; | ||
|
||
let input_view = array.views()[rhs_row]; | ||
let input_view_len = input_view as u32; | ||
|
||
// The check logic | ||
// - Check len equality | ||
// - If inlined, check inlined value | ||
// - If non-inlined, check prefix and then check value in buffer | ||
// when needed | ||
if exist_view_len != input_view_len { | ||
return false; | ||
} | ||
|
||
if exist_view_len <= 12 { | ||
let exist_inline = unsafe { | ||
GenericByteViewArray::<B>::inline_value( | ||
&exist_view, | ||
exist_view_len as usize, | ||
) | ||
}; | ||
let input_inline = unsafe { | ||
GenericByteViewArray::<B>::inline_value( | ||
&input_view, | ||
input_view_len as usize, | ||
) | ||
}; | ||
exist_inline == input_inline | ||
} else { | ||
let exist_prefix = | ||
unsafe { GenericByteViewArray::<B>::inline_value(&exist_view, 4) }; | ||
let input_prefix = | ||
unsafe { GenericByteViewArray::<B>::inline_value(&input_view, 4) }; | ||
|
||
if exist_prefix != input_prefix { | ||
return false; | ||
} | ||
|
||
let exist_full = { | ||
let byte_view = ByteView::from(exist_view); | ||
self.value( | ||
byte_view.buffer_index as usize, | ||
byte_view.offset as usize, | ||
byte_view.length as usize, | ||
) | ||
}; | ||
let input_full: &[u8] = unsafe { array.value_unchecked(rhs_row).as_ref() }; | ||
exist_full == input_full | ||
} | ||
} | ||
|
||
fn value(&self, buffer_index: usize, offset: usize, length: usize) -> &[u8] { | ||
debug_assert!(buffer_index <= self.completed.len()); | ||
|
||
if buffer_index < self.completed.len() { | ||
let block = &self.completed[buffer_index]; | ||
&block[offset..offset + length] | ||
} else { | ||
&self.in_progress[offset..offset + length] | ||
} | ||
} | ||
} | ||
|
||
/// Determines if the nullability of the existing and new input array can be used | ||
/// to short-circuit the comparison of the two values. | ||
/// | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice in the future to avoid those null checks in
GroupColumn
(even if input is nullable field) for batches containing no nulls.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems really make sense.
And I found even for the batches containing some nulls, actually we have checked which rows are really nulls in
create_hashes
.Maybe it is possible that, we reuse this check result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to pull the null / not null check into the caller of this function 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I filed an issue about this, and I am trying the straight way about using
null_count
. #12944Let's see the performance improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to do the check on the batch (does the batch contain no nulls-> follow the fast path that omits the check), so I think indeed the calling side needs to perform this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly interesting: one of the reasons special casing nulls/no-nulls can be helpful is that it permits better auto vectorization, as we are documenting here: apache/arrow-rs#6554