Skip to content

Commit

Permalink
fix: add null_buffer length check to StringArrayBuilder/`LargeStr…
Browse files Browse the repository at this point in the history
…ingArrayBuilder` (apache#13758)

* fix: add `null_buffer` check for `LargeStringArray`

Add a safety check to ensure that the alignment of buffers cannot be
overflowed. This introduces a panic if they are not aligned through a
runtime assertion.

* fix: remove value_buffer assertion

These buffers can be misaligned and it is not problematic, it is the
`null_buffer` which we care about being of the same length.

* feat: add `null_buffer` check to `StringArray`

This is in a similar vein to `LargeStringArray`, as the code is the
same, except for `i32`'s instead of `i64`.

* feat: use `row_count` var to avoid drift
  • Loading branch information
jdockerty authored Dec 15, 2024
1 parent e70319f commit bd2c975
Showing 1 changed file with 32 additions and 2 deletions.
34 changes: 32 additions & 2 deletions datafusion/functions/src/strings.rs
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,24 @@ impl StringArrayBuilder {
unsafe { self.offsets_buffer.push_unchecked(next_offset) };
}

/// Finalise the builder into a concrete [`StringArray`].
///
/// # Panics
///
/// This method can panic when:
///
/// - the provided `null_buffer` is not the same length as the `offsets_buffer`.
pub fn finish(self, null_buffer: Option<NullBuffer>) -> StringArray {
let row_count = self.offsets_buffer.len() / size_of::<i32>() - 1;
if let Some(ref null_buffer) = null_buffer {
assert_eq!(
null_buffer.len(),
row_count,
"Null buffer and offsets buffer must be the same length"
);
}
let array_builder = ArrayDataBuilder::new(DataType::Utf8)
.len(self.offsets_buffer.len() / size_of::<i32>() - 1)
.len(row_count)
.add_buffer(self.offsets_buffer.into())
.add_buffer(self.value_buffer.into())
.nulls(null_buffer);
Expand Down Expand Up @@ -335,9 +350,24 @@ impl LargeStringArrayBuilder {
unsafe { self.offsets_buffer.push_unchecked(next_offset) };
}

/// Finalise the builder into a concrete [`LargeStringArray`].
///
/// # Panics
///
/// This method can panic when:
///
/// - the provided `null_buffer` is not the same length as the `offsets_buffer`.
pub fn finish(self, null_buffer: Option<NullBuffer>) -> LargeStringArray {
let row_count = self.offsets_buffer.len() / size_of::<i64>() - 1;
if let Some(ref null_buffer) = null_buffer {
assert_eq!(
null_buffer.len(),
row_count,
"Null buffer and offsets buffer must be the same length"
);
}
let array_builder = ArrayDataBuilder::new(DataType::LargeUtf8)
.len(self.offsets_buffer.len() / size_of::<i64>() - 1)
.len(row_count)
.add_buffer(self.offsets_buffer.into())
.add_buffer(self.value_buffer.into())
.nulls(null_buffer);
Expand Down

0 comments on commit bd2c975

Please sign in to comment.