Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up filter_bytes #6699

Merged
merged 2 commits into from
Nov 7, 2024
Merged

Speed up filter_bytes #6699

merged 2 commits into from
Nov 7, 2024

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Nov 6, 2024

Which issue does this PR close?

MutableBuffer doesn't always generate optimal code compared to using Vec. I think the main performance difference might be in MutableBuffer::push (but not sure if that explains all the difference).

filter context string (kept 1/2)
                        time:   [385.65 µs 394.58 µs 405.71 µs]
                        change: [-52.835% -51.497% -50.056%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  13 (13.00%) high mild
  3 (3.00%) high severe

Benchmarking filter context string high selectivity (kept 1023/1024): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
filter context string high selectivity (kept 1023/1024)
                        time:   [1.1102 ms 1.1526 ms 1.1964 ms]
                        change: [-11.666% -8.8390% -5.7644%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context string low selectivity (kept 1/1024)
                        time:   [979.52 ns 987.11 ns 994.64 ns]
                        change: [-19.093% -18.289% -17.438%] (p = 0.00 < 0.05)
                        Performance has improved.

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 6, 2024
@Dandandan Dandandan changed the title Speed up Speed up filter_bytes Nov 6, 2024
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Nov 7, 2024

I also ran the benchmarks and see the same improvements 👍

++ critcmp master speedup_filter
group                                                                         master                                 speedup_filter
-----                                                                         ------                                 --------------
filter context mixed string view (kept 1/2)                                   1.00    674.8±6.62µs        ? ?/sec    1.01    679.6±6.81µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00   1196.4±2.54µs        ? ?/sec    1.02   1217.7±3.17µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.01   1172.9±0.95ns        ? ?/sec    1.00   1164.3±2.62ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    442.1±2.81µs        ? ?/sec    1.01    447.8±4.74µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00    783.1±1.14µs        ? ?/sec    1.01    790.9±3.73µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    936.5±1.46ns        ? ?/sec    1.01    947.4±5.04ns        ? ?/sec
filter context string (kept 1/2)                                              4.86      3.8±0.07ms        ? ?/sec    1.00   774.1±21.31µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00    244.2±1.43µs        ? ?/sec    1.00    243.6±1.24µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00    248.9±1.50µs        ? ?/sec    1.00    248.1±0.92µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    202.0±0.50µs        ? ?/sec    1.00    201.5±0.74µs        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00    341.8±0.75µs        ? ?/sec    1.00    342.5±0.93µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00    466.3±0.75µs        ? ?/sec    1.01    469.3±1.23µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00    101.1±0.22µs        ? ?/sec    1.00    100.8±0.23µs        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.63      2.5±0.06ms        ? ?/sec    1.00  1523.2±31.32µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.23   1993.4±3.93ns        ? ?/sec    1.00   1614.9±4.07ns        ? ?/sec

@alamb alamb merged commit e9bf8aa into apache:master Nov 7, 2024
26 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 7, 2024

Thanks @Dandandan and @jayzhan211

@Dandandan
Copy link
Contributor Author

I also ran the benchmarks and see the same improvements 👍

++ critcmp master speedup_filter
group                                                                         master                                 speedup_filter
-----                                                                         ------                                 --------------
filter context mixed string view (kept 1/2)                                   1.00    674.8±6.62µs        ? ?/sec    1.01    679.6±6.81µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00   1196.4±2.54µs        ? ?/sec    1.02   1217.7±3.17µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.01   1172.9±0.95ns        ? ?/sec    1.00   1164.3±2.62ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    442.1±2.81µs        ? ?/sec    1.01    447.8±4.74µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00    783.1±1.14µs        ? ?/sec    1.01    790.9±3.73µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    936.5±1.46ns        ? ?/sec    1.01    947.4±5.04ns        ? ?/sec
filter context string (kept 1/2)                                              4.86      3.8±0.07ms        ? ?/sec    1.00   774.1±21.31µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00    244.2±1.43µs        ? ?/sec    1.00    243.6±1.24µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00    248.9±1.50µs        ? ?/sec    1.00    248.1±0.92µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    202.0±0.50µs        ? ?/sec    1.00    201.5±0.74µs        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00    341.8±0.75µs        ? ?/sec    1.00    342.5±0.93µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00    466.3±0.75µs        ? ?/sec    1.01    469.3±1.23µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00    101.1±0.22µs        ? ?/sec    1.00    100.8±0.23µs        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.63      2.5±0.06ms        ? ?/sec    1.00  1523.2±31.32µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.23   1993.4±3.93ns        ? ?/sec    1.00   1614.9±4.07ns        ? ?/sec

Wow that even seems a much bigger improvement 🤔

@alamb
Copy link
Contributor

alamb commented Nov 8, 2024

Wow that even seems a much bigger improvement 🤔

Will rerun and see if I can reprodice

@alamb
Copy link
Contributor

alamb commented Nov 8, 2024

Results of rerunning

++ critcmp master speedup_filter
group                                                                         master                                 speedup_filter
-----                                                                         ------                                 --------------
filter context mixed string view (kept 1/2)                                   1.00    676.9±1.46µs        ? ?/sec    1.00    680.2±3.73µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00   1195.0±3.18µs        ? ?/sec    1.02   1217.4±3.71µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.01   1167.2±2.60ns        ? ?/sec    1.00   1156.4±1.15ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    441.9±3.60µs        ? ?/sec    1.02    451.0±2.40µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00    780.6±2.07µs        ? ?/sec    1.01    790.6±1.75µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.01    930.2±2.89ns        ? ?/sec    1.00    918.1±4.38ns        ? ?/sec
filter context string (kept 1/2)                                              4.89      3.7±0.09ms        ? ?/sec    1.00   757.8±12.84µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.02    247.1±1.64µs        ? ?/sec    1.00    242.5±0.56µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.02    252.3±1.74µs        ? ?/sec    1.00    248.4±1.13µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    201.8±0.46µs        ? ?/sec    1.00    200.9±0.48µs        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00    341.7±0.77µs        ? ?/sec    1.00    341.9±0.52µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00    466.6±1.22µs        ? ?/sec    1.01    469.1±3.93µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00    101.0±0.09µs        ? ?/sec    1.00    101.0±0.38µs        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.67      2.5±0.07ms        ? ?/sec    1.00  1491.5±32.49µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.23   1989.6±5.27ns        ? ?/sec    1.00   1622.9±4.10ns        ? ?/sec

@Dandandan
Copy link
Contributor Author

Dandandan commented Nov 8, 2024

Interesting, so it seems to have even bigger effect on other machines (maybe when branch misses have more impact or it can generate even better code) - my result is from M1 pro

@alamb
Copy link
Contributor

alamb commented Nov 8, 2024

Interesting, so it seems to have even bigger effect on other machines (maybe when branch misses have more impact or it can generate even better code) - my result is from M1 pro

This is what I am running on:

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) CPU @ 3.10GHz
stepping        : 7
microcode       : 0xffffffff
cpu MHz         : 3100.310
cache size      : 25344 KB
physical id     : 0
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 15
initial apicid  : 15
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid ts\
c_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle \
avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs taa mmio_stale_data retbleed eibrs_pbrsb bhi
bogomips        : 6200.62
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

@Dandandan
Copy link
Contributor Author

Ok, thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants