Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve in-place primitive sorts by 13-67% #4473

Merged
merged 6 commits into from
Jul 4, 2023

Conversation

psvri
Copy link
Contributor

@psvri psvri commented Jul 1, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

The current sort implementation for primitive types first sorts by indices and then performs a take operation. The kernel can be improved by directly sorting.

The results for i64 on my laptop are as follows

sort 2^10               time:   [9.1287 µs 9.1711 µs 9.2279 µs]
                        change: [-27.180% -25.980% -24.697%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  11 (11.00%) high severe

sort 2^12               time:   [70.191 µs 70.437 µs 70.741 µs]
                        change: [-15.190% -13.941% -12.614%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

sort nulls 2^10         time:   [4.9898 µs 5.0080 µs 5.0319 µs]
                        change: [-68.212% -67.754% -67.288%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  10 (10.00%) high severe

sort nulls 2^12         time:   [34.333 µs 34.641 µs 34.983 µs]
                        change: [-58.256% -57.089% -56.063%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

What changes are included in this PR?

I reworked the sort kernel so that primitive types are sorted directly without using sort_by_indices . I have also included a new primitive benchmark sort_kernel_primitives.rs .

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 1, 2023
@psvri
Copy link
Contributor Author

psvri commented Jul 1, 2023

I havent created an issue for this. Let me know if its required.

arrow-ord/src/sort.rs Outdated Show resolved Hide resolved
@tustvold tustvold changed the title Improve primitive type sorts by 13-67% Improve in-place primitive sorts by 13-67% Jul 3, 2023
@tustvold
Copy link
Contributor

tustvold commented Jul 3, 2023

Is there some way we might reduce the amount of unsafe code here, given this is a rare special case (where you don't need the indices to sort other columns) I'm keen to keep the maintenance overheads down.

arrow-ord/src/sort.rs Outdated Show resolved Hide resolved
arrow-ord/src/sort.rs Outdated Show resolved Hide resolved
arrow-ord/src/sort.rs Outdated Show resolved Hide resolved
arrow-ord/src/sort.rs Outdated Show resolved Hide resolved
@psvri
Copy link
Contributor Author

psvri commented Jul 3, 2023

Is there some way we might reduce the amount of unsafe code here, given this is a rare special case (where you don't need the indices to sort other columns) I'm keen to keep the maintenance overheads down.

I have removed all unsafe code in the latest commit.

Comment on lines +182 to +185
mutable_slice.sort_unstable_by(|a, b| a.compare(*b));
if sort_options.descending {
mutable_slice.reverse();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should be faster for the descending case?

Suggested change
mutable_slice.sort_unstable_by(|a, b| a.compare(*b));
if sort_options.descending {
mutable_slice.reverse();
}
if sort_options.descending {
mutable_slice.sort_unstable_by(|a, b| b.compare(*a));
} else {
mutable_slice.sort_unstable_by(|a, b| a.compare(*b));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it just now on my laptop. The difference is only b/n 2-3% .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. I would argue it's also a bit more simple :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given the marginal speed difference it makes sense to save on codegen by using reverse 👍

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just some minor nits

) -> Result<ArrayRef, ArrowError>
where
T: ArrowPrimitiveType,
<T as arrow_array::ArrowPrimitiveType>::Native: ArrowNativeTypeOp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary given the constraints on ArrowPrimitiveType::Native

Comment on lines 146 to 147
let array_data = values.to_data();
let input_values = array_data.buffer(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let array_data = values.to_data();
let input_values = array_data.buffer(0);
let array = array.as_primitive::<T>();
let input_values = array.values().as_ref();

This not only avoids marshaling to ArrayData, but also the code is technically exploiting an implementation detail that PrimitiveArray returns ArrayData with a zero offset

Comment on lines 154 to 155
if values.null_count() > 0 {
let nulls = array_data.nulls().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if values.null_count() > 0 {
let nulls = array_data.nulls().unwrap();
if let Some(nulls) = array.nulls().filter(|n| n.null_count() > 0) {

let array_data = values.to_data();
let input_values = array_data.buffer(0);

let mut null_bit_buffer = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer to use an expression style here, rather than using mut

e.g.

let nulls = match array.nulls().filter(|n| n.null_count() > 0) {
Some(nulls) => ...,
None => ...
}


let result_capacity = values.len()
* std::mem::size_of::<<T as arrow_array::ArrowPrimitiveType>::Native>();
let mut mutable_buffer = MutableBuffer::new(result_capacity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered just using Vec here?

let nulls = array_data.nulls().unwrap();

let mut validity_buffer = BooleanBufferBuilder::new(values.len());
let values_slice;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer the expression style, e.g.

let values_slice = match sort_options.nulls_first {
    true => ...,
    false => ...
}

It makes it easier to see what is going on and where the value is being set

Comment on lines +182 to +185
mutable_slice.sort_unstable_by(|a, b| a.compare(*b));
if sort_options.descending {
mutable_slice.reverse();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given the marginal speed difference it makes sense to save on codegen by using reverse 👍

null_bit_buffer = Some(validity_buffer.finish().into());
} else {
mutable_slice.copy_from_slice(&input_values[..values.len()]);
mutable_slice.sort_unstable_by(|a, b| a.compare(*b));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mutable_slice.sort_unstable_by(|a, b| a.compare(*b));
mutable_slice.sort_unstable();

Should be the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for floats, we use total ordering not the default partial ordering

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah forgot about that :)

@psvri
Copy link
Contributor Author

psvri commented Jul 4, 2023

I will make these changes today.

@@ -57,11 +58,137 @@ pub fn sort(
values: &dyn Array,
options: Option<SortOptions>,
) -> Result<ArrayRef, ArrowError> {
if let DataType::RunEndEncoded(_, _) = values.data_type() {
return sort_run(values, options, None);
match values.data_type() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you changed sort_native_type to take PrimitiveArray instead of dyn Array you could use the downcast_primitive_array macro here

@psvri
Copy link
Contributor Author

psvri commented Jul 4, 2023

Thanks for the above comments and the latest commit addresses them. It simplified the code a lot.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you

Comment on lines 62 to 66
values => return sort_native_type(values, options),
DataType::RunEndEncoded(_, _) => return sort_run(values, options, None),
_ => {
let indices = sort_to_indices(values, options, None)?;
return take(values, &indices, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
values => return sort_native_type(values, options),
DataType::RunEndEncoded(_, _) => return sort_run(values, options, None),
_ => {
let indices = sort_to_indices(values, options, None)?;
return take(values, &indices, None)
values => sort_native_type(values, options),
DataType::RunEndEncoded(_, _) => sort_run(values, options, None),
_ => {
let indices = sort_to_indices(values, options, None)?;
take(values, &indices, None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tustvold tustvold merged commit aac3aa9 into apache:master Jul 4, 2023
@psvri psvri deleted the sort-improvements branch July 4, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants