-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine documentation for unary_mut
and binary_mut
#5798
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
efac735
Refine documentation for unary_mut and binary_mut,
alamb a15113e
Update arrow-array/src/array/primitive_array.rs
alamb f73ea95
Merge remote-tracking branch 'apache/master' into alamb/op_docs
alamb 6d91e8b
Merge remote-tracking branch 'apache/master' into alamb/op_docs
alamb c511d50
Update binary_mut example to show different array types
alamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -419,7 +419,7 @@ pub type Decimal256Array = PrimitiveArray<Decimal256Type>; | |
|
||
pub use crate::types::ArrowPrimitiveType; | ||
|
||
/// An array of [primitive values](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout) | ||
/// An array of primitive values, of type [`ArrowPrimitiveType`] | ||
/// | ||
/// # Example: From a Vec | ||
/// | ||
|
@@ -480,6 +480,19 @@ pub use crate::types::ArrowPrimitiveType; | |
/// assert_eq!(array.values(), &[1, 0, 2]); | ||
/// assert!(array.is_null(1)); | ||
/// ``` | ||
/// | ||
/// # Example: Get a `PrimitiveArray` from an [`ArrayRef`] | ||
/// ``` | ||
/// # use std::sync::Arc; | ||
/// # use arrow_array::{Array, cast::AsArray, ArrayRef, Float32Array, PrimitiveArray}; | ||
/// # use arrow_array::types::{Float32Type}; | ||
/// # use arrow_schema::DataType; | ||
/// # let array: ArrayRef = Arc::new(Float32Array::from(vec![1.2, 2.3])); | ||
/// // will panic if the array is not a Float32Array | ||
/// assert_eq!(&DataType::Float32, array.data_type()); | ||
/// let f32_array: Float32Array = array.as_primitive().clone(); | ||
/// assert_eq!(f32_array, Float32Array::from(vec![1.2, 2.3])); | ||
/// ``` | ||
pub struct PrimitiveArray<T: ArrowPrimitiveType> { | ||
data_type: DataType, | ||
/// Values data | ||
|
@@ -732,22 +745,34 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
PrimitiveArray::from(unsafe { d.build_unchecked() }) | ||
} | ||
|
||
/// Applies an unary and infallible function to a primitive array. | ||
/// This is the fastest way to perform an operation on a primitive array when | ||
/// the benefits of a vectorized operation outweigh the cost of branching nulls and non-nulls. | ||
/// Applies a unary infallible function to a primitive array, producing a | ||
/// new array of potentially different type. | ||
/// | ||
/// This is the fastest way to perform an operation on a primitive array | ||
/// when the benefits of a vectorized operation outweigh the cost of | ||
/// branching nulls and non-nulls. | ||
/// | ||
/// # Implementation | ||
/// See also | ||
/// * [`Self::unary_mut`] for in place modification. | ||
/// * [`Self::try_unary`] for fallible operations. | ||
/// * [`arrow::compute::binary`] for binary operations | ||
/// | ||
/// [`arrow::compute::binary`]: https://docs.rs/arrow/latest/arrow/compute/fn.binary.html | ||
/// # Null Handling | ||
/// | ||
/// Applies the function for all values, including those on null slots. This | ||
/// will often allow the compiler to generate faster vectorized code, but | ||
/// requires that the operation must be infallible (not error/panic) for any | ||
/// value of the corresponding type or this function may panic. | ||
/// | ||
/// This will apply the function for all values, including those on null slots. | ||
/// This implies that the operation must be infallible for any value of the corresponding type | ||
/// or this function may panic. | ||
/// # Example | ||
/// ```rust | ||
/// # use arrow_array::{Int32Array, types::Int32Type}; | ||
/// # use arrow_array::{Int32Array, Float32Array, types::Int32Type}; | ||
/// # fn main() { | ||
/// let array = Int32Array::from(vec![Some(5), Some(7), None]); | ||
/// let c = array.unary(|x| x * 2 + 1); | ||
/// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None])); | ||
/// // Create a new array with the value of applying sqrt | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed this to show you can make a different type |
||
/// let c = array.unary(|x| f32::sqrt(x as f32)); | ||
/// assert_eq!(c, Float32Array::from(vec![Some(2.236068), Some(2.6457512), None])); | ||
/// # } | ||
/// ``` | ||
pub fn unary<F, O>(&self, op: F) -> PrimitiveArray<O> | ||
|
@@ -766,24 +791,50 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
PrimitiveArray::new(buffer.into(), nulls) | ||
} | ||
|
||
/// Applies an unary and infallible function to a mutable primitive array. | ||
/// Mutable primitive array means that the buffer is not shared with other arrays. | ||
/// As a result, this mutates the buffer directly without allocating new buffer. | ||
/// Applies a unary and infallible function to the array in place if possible. | ||
/// | ||
/// # Buffer Reuse | ||
/// | ||
/// If the underlying buffers are not shared with other arrays, mutates the | ||
/// underlying buffer in place, without allocating. | ||
/// | ||
/// If the underlying buffer is shared, returns Err(self) | ||
/// | ||
/// # Implementation | ||
/// # Null Handling | ||
/// | ||
/// See [`Self::unary`] for more information on null handling. | ||
/// | ||
/// This will apply the function for all values, including those on null slots. | ||
/// This implies that the operation must be infallible for any value of the corresponding type | ||
/// or this function may panic. | ||
/// # Example | ||
/// | ||
/// ```rust | ||
/// # use arrow_array::{Int32Array, types::Int32Type}; | ||
/// # fn main() { | ||
/// let array = Int32Array::from(vec![Some(5), Some(7), None]); | ||
/// // Apply x*2+1 to the data in place, no allocations | ||
/// let c = array.unary_mut(|x| x * 2 + 1).unwrap(); | ||
/// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None])); | ||
/// # } | ||
/// ``` | ||
/// | ||
/// # Example: modify [`ArrayRef`] in place, if not shared | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an example I imagine someone might be looking for (like I would be) |
||
/// | ||
/// It is also possible to modify an [`ArrayRef`] if there are no other | ||
/// references to the underlying buffer. | ||
/// | ||
/// ```rust | ||
/// # use std::sync::Arc; | ||
/// # use arrow_array::{Array, cast::AsArray, ArrayRef, Int32Array, PrimitiveArray, types::Int32Type}; | ||
/// # let array: ArrayRef = Arc::new(Int32Array::from(vec![Some(5), Some(7), None])); | ||
/// // Convert to Int32Array (panic's if array.data_type is not Int32) | ||
/// let a = array.as_primitive::<Int32Type>().clone(); | ||
/// // Try to apply x*2+1 to the data in place, fails because array is still shared | ||
/// a.unary_mut(|x| x * 2 + 1).unwrap_err(); | ||
/// // Try again, this time dropping the last remaining reference | ||
/// let a = array.as_primitive::<Int32Type>().clone(); | ||
/// drop(array); | ||
/// // Now we can apply the operation in place | ||
/// let c = a.unary_mut(|x| x * 2 + 1).unwrap(); | ||
/// assert_eq!(c, Int32Array::from(vec![Some(11), Some(15), None])); | ||
/// ``` | ||
|
||
pub fn unary_mut<F>(self, op: F) -> Result<PrimitiveArray<T>, PrimitiveArray<T>> | ||
where | ||
F: Fn(T::Native) -> T::Native, | ||
|
@@ -796,11 +847,12 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
Ok(builder.finish()) | ||
} | ||
|
||
/// Applies a unary and fallible function to all valid values in a primitive array | ||
/// Applies a unary fallible function to all valid values in a primitive | ||
/// array, producing a new array of potentially different type. | ||
/// | ||
/// This is unlike [`Self::unary`] which will apply an infallible function to all rows | ||
/// regardless of validity, in many cases this will be significantly faster and should | ||
/// be preferred if `op` is infallible. | ||
/// Applies `op` to only rows that are valid, which is often significantly | ||
/// slower than [`Self::unary`], which should be preferred if `op` is | ||
/// fallible. | ||
/// | ||
/// Note: LLVM is currently unable to effectively vectorize fallible operations | ||
pub fn try_unary<F, O, E>(&self, op: F) -> Result<PrimitiveArray<O>, E> | ||
|
@@ -829,13 +881,16 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
Ok(PrimitiveArray::new(values, nulls)) | ||
} | ||
|
||
/// Applies an unary and fallible function to all valid values in a mutable primitive array. | ||
/// Mutable primitive array means that the buffer is not shared with other arrays. | ||
/// As a result, this mutates the buffer directly without allocating new buffer. | ||
/// Applies a unary fallible function to all valid values in a mutable | ||
/// primitive array. | ||
/// | ||
/// # Null Handling | ||
/// | ||
/// See [`Self::try_unary`] for more information on null handling. | ||
/// | ||
/// # Buffer Reuse | ||
/// | ||
/// This is unlike [`Self::unary_mut`] which will apply an infallible function to all rows | ||
/// regardless of validity, in many cases this will be significantly faster and should | ||
/// be preferred if `op` is infallible. | ||
/// See [`Self::unary_mut`] for more information on buffer reuse. | ||
/// | ||
/// This returns an `Err` when the input array is shared buffer with other | ||
/// array. In the case, returned `Err` wraps input array. If the function | ||
|
@@ -870,9 +925,9 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
|
||
/// Applies a unary and nullable function to all valid values in a primitive array | ||
/// | ||
/// This is unlike [`Self::unary`] which will apply an infallible function to all rows | ||
/// regardless of validity, in many cases this will be significantly faster and should | ||
/// be preferred if `op` is infallible. | ||
/// Applies `op` to only rows that are valid, which is often significantly | ||
/// slower than [`Self::unary`], which should be preferred if `op` is | ||
/// fallible. | ||
/// | ||
/// Note: LLVM is currently unable to effectively vectorize fallible operations | ||
pub fn unary_opt<F, O>(&self, op: F) -> PrimitiveArray<O> | ||
|
@@ -915,8 +970,16 @@ impl<T: ArrowPrimitiveType> PrimitiveArray<T> { | |
PrimitiveArray::new(values, Some(nulls)) | ||
} | ||
|
||
/// Returns `PrimitiveBuilder` of this primitive array for mutating its values if the underlying | ||
/// data buffer is not shared by others. | ||
/// Returns a `PrimitiveBuilder` for this array, suitable for mutating values | ||
/// in place. | ||
/// | ||
/// # Buffer Reuse | ||
/// | ||
/// If the underlying data buffer has no other outstanding references, the | ||
/// buffer is used without copying. | ||
/// | ||
/// If the underlying data buffer does have outstanding references, returns | ||
/// `Err(self)` | ||
pub fn into_builder(self) -> Result<PrimitiveBuilder<T>, Self> { | ||
let len = self.len(); | ||
let data = self.into_data(); | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is obvious how to go from
Arc<dyn Array>
back toPrimitiveArray
so I documented that too