Undo run end filter performance regression #6691

delamarch3 · 2024-11-05T19:03:30Z

Which issue does this PR close?

Related to #6675

Rationale for this change

After running some benchmarks it turned out that the changes introduced in my previous PR were quite slow. This should bring it back closer to how it was before.

run_ends_len = 64       time:   [3.9878 µs 4.0035 µs 4.0204 µs]
                        change: [-1.3684% -0.4485% +0.5210%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

run_ends_len = 512      time:   [56.654 µs 57.076 µs 57.458 µs]
                        change: [-1.7583% -1.0124% -0.2300%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

run_ends_len = 1024     time:   [151.37 µs 151.83 µs 152.34 µs]
                        change: [-4.5739% -3.1197% -1.6545%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

run_ends_len = 64       time:   [2.7373 µs 2.8041 µs 2.8841 µs]
                        change: [-32.545% -31.650% -30.574%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

run_ends_len = 512      time:   [4.5291 µs 4.5413 µs 4.5548 µs]
                        change: [-91.973% -91.902% -91.821%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

run_ends_len = 1024     time:   [6.3687 µs 6.4206 µs 6.4869 µs]
                        change: [-95.802% -95.750% -95.693%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

What changes are included in this PR?

I subtract the difference from end to keep it in bounds.

Are there any user-facing changes?

No

… filter_run_end_array

…s into run-end-filter-safety

alamb · 2024-11-08T18:41:09Z

I am running the benchmarks on this PR to verify. Thank you @delamarch3

alamb · 2024-11-08T19:12:02Z

I apologize for being denise @delamarch3 -- but I spent a while trying to find the benchmarks you are running and I couldn't figure out which they were. Is it the filter_kernels?

delamarch3 · 2024-11-08T19:20:32Z

@alamb Sorry, I wrote a separate benchmark for this but I didn't commit it, it's not consistent with the results it returns on each run (the odd +/-5%) so I thought it needed work but there was enough of a difference to compare. I can add it into this PR though?

delamarch3 · 2024-11-08T19:39:38Z

I'll try to add one into filter_kernels

alamb · 2024-11-08T19:41:55Z

I'll try to add one into filter_kernels

Thank you -- if you could do so as a separate PR that would be most helpful (so it is easy to compare with these changes) 🙏

Dandandan · 2024-11-08T21:39:37Z

arrow-select/src/filter.rs

+        end -= difference;
+
+        // Safety: we subtract the difference off `end` so we are always within bounds
+        for pred in (start..end).map(|i| unsafe { filter_values.value_unchecked(i as usize) }) {


Perhaps we could iterate on the filter_values? E.g. let mut preds = filter_values.iter() and calling preds.next() in the loop. That way you can avoid using unsafe, while it might still generate fast code (?).

Thanks, I'll try this out and post the results

delamarch3 · 2024-11-09T14:52:28Z

I've run the filter_kernel benchmark I added for the run array in #6706 with the different approaches, here are the results I get:

for pred in filter_values
    .iter()
    .skip(start as usize)
    .take((end - start) as usize)
{
    count += R::Native::from(pred);
    keep |= pred
}

Benchmarking filter run array (kept 1/2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 52.1s, or reduce sample count to 10.
filter run array (kept 1/2)
                        time:   [542.98 ms 549.50 ms 556.59 ms]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

Benchmarking filter run array high selectivity (kept 1023/1024): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 54.3s, or reduce sample count to 10.
Benchmarking filter run array high selectivity (kept 1023/1024): Collecting 100 samples in estimated 54.256 s (100 iterations
filter run array high selectivity (kept 1023/1024)
                        time:   [550.25 ms 555.80 ms 561.74 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking filter run array low selectivity (kept 1/1024): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 53.5s, or reduce sample count to 10.
filter run array low selectivity (kept 1/1024)
                        time:   [536.14 ms 540.44 ms 545.14 ms]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

for _ in start..end {
    if let Some(pred) = preds.next() {
        count += R::Native::from(pred);
        keep |= pred
    }
}

filter run array (kept 1/2)
                        time:   [598.70 µs 601.93 µs 605.25 µs]
                        change: [-99.892% -99.890% -99.889%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

Benchmarking filter run array high selectivity (kept 1023/1024): Collecting 100 samples in estimated 6.0573 s (15k iterations
filter run array high selectivity (kept 1023/1024)
                        time:   [386.55 µs 388.17 µs 389.91 µs]
                        change: [-99.931% -99.930% -99.929%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

filter run array low selectivity (kept 1/1024)
                        time:   [239.93 µs 240.46 µs 241.04 µs]
                        change: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

These two are similar but after running a few times the low selectivity benchmark seems slightly faster in this one

end -= end.saturating_sub(filter_values.len() as u64);
for pred in (start..end).map(|i| unsafe { filter_values.value_unchecked(i as usize) }) {
    count += R::Native::from(pred);
    keep |= pred
}

filter run array (kept 1/2)
                        time:   [581.12 µs 584.01 µs 586.90 µs]
                        change: [-2.5195% -1.1178% +0.1036%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking filter run array high selectivity (kept 1023/1024): Collecting 100 samples in estimated 5.5900 s (15k iterations
filter run array high selectivity (kept 1023/1024)
                        time:   [359.79 µs 361.40 µs 363.47 µs]
                        change: [-7.7904% -5.5816% -3.1503%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

filter run array low selectivity (kept 1/1024)
                        time:   [209.87 µs 210.45 µs 211.09 µs]
                        change: [-13.950% -13.255% -12.616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

Dandandan · 2024-11-09T17:25:30Z

for _ in start..end {
    if let Some(pred) = preds.next() {
        count += R::Native::from(pred);
        keep |= pred
    }
}

Maybe you could try something like (two minor changes)?

    for pred in preds.take(end-start) {
        count += R::Native::from(pred);
    }
    let keep = count > 0;

I think you can leave out the saturating_sub as well as long as you're not using unsafe? Otherwise you can try the unckeched variants as well for iter

delamarch3 · 2024-11-10T15:40:35Z

I'm not sure calling take on each iteration will work because it takes ownership of the Iterator so it needs to be reassigned on each run_ends iteration, but then it will always take on the first n elements, unless you had something else in mind? I did try BitIterator::new(filter_values.values(), start, end - start) to try achieve something similar but It turned out to be slower.

I had to create a new variable with the count for assigning keep after the loop, ie let keep = count > count_before, for the algorithm to still be correct, but this was also slower.

Dandandan · 2024-11-10T16:39:13Z

For me looks good - I feel somehow there probably should be more performance on the table.

I believe the reason is BitIterator is not faster is that currently it's doing the same as filter_values.value_unchecked so for using the iterator it is only the extra cost of the bounds check.

I think it should be possible to improve BitIterator (use bitshift rather than using the current index) and use unwrap_unchecked() to be slightly more performant than the current solution.

Dandandan · 2024-11-10T16:40:40Z

I'm not sure calling take on each iteration will work because it takes ownership of the Iterator so it needs to be reassigned on each run_ends iteration, but then it will always take on the first n elements, unless you had something else in mind? I did try BitIterator::new(filter_values.values(), start, end - start) to try achieve something similar but It turned out to be slower.

Of course, you're right.

Dandandan · 2024-11-10T18:17:46Z

Thanks @delamarch3

delamarch3 added 7 commits November 3, 2024 12:21

ensure predicate and values have the same length before passing on to…

540c8d9

… filter_run_end_array

fix error wording

9c661bf

have filter_run_end_array use filter array with run_ends max value size

1f5b61c

use skip and take to iterate over filter values in filter_run_end_array

51a47fa

check array values in max_value_gt_predicate_len test

20cc93a

run end filter performance regression

ed14e58

use names consistent with other functions

20e6eeb

github-actions bot added the arrow Changes to the arrow crate label Nov 5, 2024

delamarch3 and others added 3 commits November 5, 2024 19:04

Merge branch 'master' into run-end-filter-safety

a0eb724

clippy

44d87cf

Merge branch 'run-end-filter-safety' of github.com:delamarch3/arrow-r…

32d6a95

…s into run-end-filter-safety

delamarch3 mentioned this pull request Nov 8, 2024

Add filter_kernel benchmark for run array #6706

Merged

Dandandan reviewed Nov 8, 2024

View reviewed changes

Dandandan approved these changes Nov 10, 2024

View reviewed changes

Dandandan merged commit 24f455e into apache:master Nov 10, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undo run end filter performance regression #6691

Undo run end filter performance regression #6691

delamarch3 commented Nov 5, 2024

alamb commented Nov 8, 2024

alamb commented Nov 8, 2024

delamarch3 commented Nov 8, 2024 •

edited

Loading

delamarch3 commented Nov 8, 2024

alamb commented Nov 8, 2024

Dandandan Nov 8, 2024 •

edited

Loading

delamarch3 Nov 9, 2024

delamarch3 commented Nov 9, 2024

Dandandan commented Nov 9, 2024 •

edited

Loading

delamarch3 commented Nov 10, 2024 •

edited

Loading

Dandandan commented Nov 10, 2024 •

edited

Loading

Dandandan commented Nov 10, 2024

Dandandan commented Nov 10, 2024

Undo run end filter performance regression #6691

Undo run end filter performance regression #6691

Conversation

delamarch3 commented Nov 5, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb commented Nov 8, 2024

alamb commented Nov 8, 2024

delamarch3 commented Nov 8, 2024 • edited Loading

delamarch3 commented Nov 8, 2024

alamb commented Nov 8, 2024

Dandandan Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

delamarch3 Nov 9, 2024

Choose a reason for hiding this comment

delamarch3 commented Nov 9, 2024

Dandandan commented Nov 9, 2024 • edited Loading

delamarch3 commented Nov 10, 2024 • edited Loading

Dandandan commented Nov 10, 2024 • edited Loading

Dandandan commented Nov 10, 2024

Dandandan commented Nov 10, 2024

delamarch3 commented Nov 8, 2024 •

edited

Loading

Dandandan Nov 8, 2024 •

edited

Loading

Dandandan commented Nov 9, 2024 •

edited

Loading

delamarch3 commented Nov 10, 2024 •

edited

Loading

Dandandan commented Nov 10, 2024 •

edited

Loading