-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve in-place primitive sorts by 13-67% #4473
Conversation
I havent created an issue for this. Let me know if its required. |
Is there some way we might reduce the amount of unsafe code here, given this is a rare special case (where you don't need the indices to sort other columns) I'm keen to keep the maintenance overheads down. |
I have removed all unsafe code in the latest commit. |
mutable_slice.sort_unstable_by(|a, b| a.compare(*b)); | ||
if sort_options.descending { | ||
mutable_slice.reverse(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be faster for the descending case?
mutable_slice.sort_unstable_by(|a, b| a.compare(*b)); | |
if sort_options.descending { | |
mutable_slice.reverse(); | |
} | |
if sort_options.descending { | |
mutable_slice.sort_unstable_by(|a, b| b.compare(*a)); | |
} else { | |
mutable_slice.sort_unstable_by(|a, b| a.compare(*b)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested it just now on my laptop. The difference is only b/n 2-3% .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking. I would argue it's also a bit more simple :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think given the marginal speed difference it makes sense to save on codegen by using reverse 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good just some minor nits
arrow-ord/src/sort.rs
Outdated
) -> Result<ArrayRef, ArrowError> | ||
where | ||
T: ArrowPrimitiveType, | ||
<T as arrow_array::ArrowPrimitiveType>::Native: ArrowNativeTypeOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary given the constraints on ArrowPrimitiveType::Native
arrow-ord/src/sort.rs
Outdated
let array_data = values.to_data(); | ||
let input_values = array_data.buffer(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let array_data = values.to_data(); | |
let input_values = array_data.buffer(0); | |
let array = array.as_primitive::<T>(); | |
let input_values = array.values().as_ref(); |
This not only avoids marshaling to ArrayData, but also the code is technically exploiting an implementation detail that PrimitiveArray returns ArrayData with a zero offset
arrow-ord/src/sort.rs
Outdated
if values.null_count() > 0 { | ||
let nulls = array_data.nulls().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if values.null_count() > 0 { | |
let nulls = array_data.nulls().unwrap(); | |
if let Some(nulls) = array.nulls().filter(|n| n.null_count() > 0) { |
arrow-ord/src/sort.rs
Outdated
let array_data = values.to_data(); | ||
let input_values = array_data.buffer(0); | ||
|
||
let mut null_bit_buffer = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nicer to use an expression style here, rather than using mut
e.g.
let nulls = match array.nulls().filter(|n| n.null_count() > 0) {
Some(nulls) => ...,
None => ...
}
arrow-ord/src/sort.rs
Outdated
|
||
let result_capacity = values.len() | ||
* std::mem::size_of::<<T as arrow_array::ArrowPrimitiveType>::Native>(); | ||
let mut mutable_buffer = MutableBuffer::new(result_capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered just using Vec here?
arrow-ord/src/sort.rs
Outdated
let nulls = array_data.nulls().unwrap(); | ||
|
||
let mut validity_buffer = BooleanBufferBuilder::new(values.len()); | ||
let values_slice; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer the expression style, e.g.
let values_slice = match sort_options.nulls_first {
true => ...,
false => ...
}
It makes it easier to see what is going on and where the value is being set
mutable_slice.sort_unstable_by(|a, b| a.compare(*b)); | ||
if sort_options.descending { | ||
mutable_slice.reverse(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think given the marginal speed difference it makes sense to save on codegen by using reverse 👍
null_bit_buffer = Some(validity_buffer.finish().into()); | ||
} else { | ||
mutable_slice.copy_from_slice(&input_values[..values.len()]); | ||
mutable_slice.sort_unstable_by(|a, b| a.compare(*b)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutable_slice.sort_unstable_by(|a, b| a.compare(*b)); | |
mutable_slice.sort_unstable(); |
Should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for floats, we use total ordering not the default partial ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah forgot about that :)
I will make these changes today. |
arrow-ord/src/sort.rs
Outdated
@@ -57,11 +58,137 @@ pub fn sort( | |||
values: &dyn Array, | |||
options: Option<SortOptions>, | |||
) -> Result<ArrayRef, ArrowError> { | |||
if let DataType::RunEndEncoded(_, _) = values.data_type() { | |||
return sort_run(values, options, None); | |||
match values.data_type() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you changed sort_native_type to take PrimitiveArray instead of dyn Array you could use the downcast_primitive_array macro here
Thanks for the above comments and the latest commit addresses them. It simplified the code a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you
arrow-ord/src/sort.rs
Outdated
values => return sort_native_type(values, options), | ||
DataType::RunEndEncoded(_, _) => return sort_run(values, options, None), | ||
_ => { | ||
let indices = sort_to_indices(values, options, None)?; | ||
return take(values, &indices, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values => return sort_native_type(values, options), | |
DataType::RunEndEncoded(_, _) => return sort_run(values, options, None), | |
_ => { | |
let indices = sort_to_indices(values, options, None)?; | |
return take(values, &indices, None) | |
values => sort_native_type(values, options), | |
DataType::RunEndEncoded(_, _) => sort_run(values, options, None), | |
_ => { | |
let indices = sort_to_indices(values, options, None)?; | |
take(values, &indices, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Which issue does this PR close?
Closes #.
Rationale for this change
The current sort implementation for primitive types first sorts by indices and then performs a take operation. The kernel can be improved by directly sorting.
The results for i64 on my laptop are as follows
What changes are included in this PR?
I reworked the sort kernel so that primitive types are sorted directly without using sort_by_indices . I have also included a new primitive benchmark
sort_kernel_primitives.rs
.Are there any user-facing changes?
No