perf: Try fast pow for `EigenStepper` step size scaling #3153

andiwand · 2024-04-28T15:23:05Z

Currently the step scaling calculation has a big impact on the EigenStepper performance. In this PR the pow(x, 0.25) is approximated with fastPow (similar to fast inverse square root) which relies on the bit representation of the floating point number.

The assumption is that we do not really care about the precision of this value but rather that it gives a good approximation for the step size scaling.

Edit: After measuring the performance it seems like there is no measurable improvement. I made this approximation compile time optional for now. See this #3153 (comment) for more details

References

https://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp

AJPfleger · 2024-04-28T17:25:48Z

Since we are clamping anyway, we could look at the representation of double x and the clamping limits 0.25 and 4.

so we want to end up with a number in the range $(1 \cdot 2^{-2},1 \cdot 2^{2})$ for this our initial x needs to be in $(1 \cdot 2^{-2 \cdot 4},1 \cdot 2^{2 \cdot 4})$

we could then construct the function using std::frexp, maybe like this:

int exponent;
double significant = std::frexp(x, &i);

// -7 instead of -8, because significant is in the interval [0.5, 1)
if ( i < -7) {
    return 0.25;
} else if ( i > 8) {
    return 4;
} else {
    int remainder = exponent % 4;
    // TODO extra-step on `significant` using remainder and pow-estimate
    return ldexp(significant, exponent / 4);
}

Even with the other technique, it might be useful to do the clamping before.

github-actions · 2024-04-28T18:02:28Z

📊: Physics performance monitoring for `441aa99`

Full contents

physmon summary

andiwand · 2024-04-29T07:42:15Z

Apparently std::sqrt(std::sqrt(x)) is pretty fast.

Benchmarking a=247.842, b=0.25...
- void: 20000 runs of 10000 iteration(s), 119.2ms total, 5.8750+/-0.0005µs per run, 0.588+/-0.005ns per iteration
- std::pow: 20000 runs of 10000 iteration(s), 1134.6ms total, 56.2090+/-0.0032µs per run, 5.621+/-0.032ns per iteration
- std::exp: 20000 runs of 10000 iteration(s), 901.2ms total, 44.8340+/-0.0016µs per run, 4.483+/-0.016ns per iteration
- std::sqrt: 20000 runs of 10000 iteration(s), 197.8ms total, 9.8750+/-0.0005µs per run, 0.988+/-0.005ns per iteration
- fastPow: 20000 runs of 10000 iteration(s), 147.6ms total, 7.4590+/-0.0011µs per run, 0.746+/-0.011ns per iteration
- fastPowMorePrecise: 20000 runs of 10000 iteration(s), 209.8ms total, 10.5830+/-0.0005µs per run, 1.058+/-0.005ns per iteration
- fastPowAnother: 20000 runs of 10000 iteration(s), 178.2ms total, 9.0000+/-0.0080µs per run, 0.900+/-0.080ns per iteration

fastPow is still faster but it does not have an impact on the overall stepper performance.

I suspect the reason for higher performance when removing the second scaling is that our current scaling strategy increases and decreases the step size periodically which results in a lot of attempted steps and possibly mispredicted branches.

I will make the function selection configurable for now.

Tests/Benchmarks/QuickMathBenchmark.cpp

Tests/UnitTests/Core/Utilities/QuickMathTests.cpp

Core/include/Acts/Propagator/EigenStepper.ipp

Co-authored-by: Alexander J. Pfleger <[email protected]>

acts-project-service · 2024-04-30T00:07:19Z

✅ Athena integration test results [`620069e`]

✅ All tests successful

status	job	report
🟢	run_unit_tests
🟢	test_ActsEFTrackFit
🟢	test_ActsPersistifySeeds
🟢	test_ActsBenchmarkWithSpot
🟢	test_ActsAnalogueClustering
🟢	test_ActsConversionWorkflow
🟢	test_ActsWorkflowHeavyIons
🟢	test_ActsWorkflowFastTracking
🟢	test_ActsWorkflow
🟢	test_ActsValidateAmbiguityResolution
🟢	test_ActsValidateResolvedTracks
🟢	test_ActsValidateTracks
🟢	test_ActsValidateActsCoreSpacePoints
🟢	test_ActsValidateActsSpacePoints
🟢	test_ActsValidateSeeds
🟢	test_ActsValidateOrthogonalSeeds
🟢	test_ActsValidateClusters
🟢	test_ActsPersistifyEDM
🟢	test_ActsGSFRefitting
🟢	test_ActsKfRefitting
🟢	test_ActsExtrapolationAlgTest
🟢	test_ActsITkTest
🟢	run_workflow_tests_run4_mc
🟢	run_workflow_tests_run2_mc
🟢	run_workflow_tests_run2_data
🟢	run_workflow_tests_run3_mc
🟢	run_workflow_tests_run3_data
🟢	run_art_test: test_data18_13TeV_1000evt
🟢	run_art_test: test_ttbarPU40_reco

…#3153) Currently the step scaling calculation has a big impact on the `EigenStepper` performance. In this PR the `pow(x, 0.25)` is approximated with `fastPow` (similar to fast inverse square root) which relies on the bit representation of the floating point number. The assumption is that we do not really care about the precision of this value but rather that it gives a good approximation for the step size scaling. Edit: After measuring the performance it seems like there is no measurable improvement. I made this approximation compile time optional for now. See this acts-project#3153 (comment) for more details <img width="577" alt="image" src="https://github.com/acts-project/acts/assets/487211/d613caeb-991f-4a89-98fc-bc051be13b7b"> References - https://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp

perf: Fast pow for EigenStepper step scaling

5c09fde

andiwand added this to the next milestone Apr 28, 2024

andiwand changed the title ~~perf: Fast pow for EigenStepper step scaling~~ perf: Fast pow for EigenStepper step size scaling Apr 28, 2024

github-actions bot added the Component - Core Affects the Core module label Apr 28, 2024

fix ci build

ebfeb8b

reinstall div 0 protection; more approx; generalize scaling

a3ddf98

andiwand marked this pull request as draft April 28, 2024 17:41

andiwand added 2 commits April 28, 2024 20:03

doc; benchmark

589ba72

fix build; remove another approx

c94dd3a

andiwand added 4 commits April 29, 2024 13:18

minor

fc92685

fix shadowing

d41e5cc

make configurable and default to std::sqrt

0589857

use constexpr after perf measure

c681875

andiwand marked this pull request as ready for review April 29, 2024 12:33

AJPfleger reviewed Apr 29, 2024

View reviewed changes

andiwand changed the title ~~perf: Fast pow for EigenStepper step size scaling~~ perf: Try fast pow for EigenStepper step size scaling Apr 29, 2024

andiwand requested a review from AJPfleger April 29, 2024 15:19

andiwand and others added 3 commits April 29, 2024 17:43

Apply suggestions from code review

d6bc3f0

Co-authored-by: Alexander J. Pfleger <[email protected]>

fixes after suggestions

d8c45a0

formatting

b90f3fa

AJPfleger approved these changes Apr 29, 2024

View reviewed changes

andiwand added the automerge label Apr 29, 2024

Merge branch 'main' into perf-eigenstepper-fast-pow

441aa99

kodiakhq bot merged commit 620069e into acts-project:main Apr 29, 2024
51 checks passed

github-actions bot removed the automerge label Apr 29, 2024

andiwand deleted the perf-eigenstepper-fast-pow branch April 29, 2024 22:25

andiwand modified the milestones: next, v35.0.0 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Try fast pow for `EigenStepper` step size scaling #3153

perf: Try fast pow for `EigenStepper` step size scaling #3153

andiwand commented Apr 28, 2024 •

edited

Loading

AJPfleger commented Apr 28, 2024 •

edited

Loading

github-actions bot commented Apr 28, 2024 •

edited

Loading

andiwand commented Apr 29, 2024

acts-project-service commented Apr 30, 2024

perf: Try fast pow for EigenStepper step size scaling #3153

perf: Try fast pow for EigenStepper step size scaling #3153

Conversation

andiwand commented Apr 28, 2024 • edited Loading

AJPfleger commented Apr 28, 2024 • edited Loading

github-actions bot commented Apr 28, 2024 • edited Loading

📊: Physics performance monitoring for 441aa99

physmon summary

andiwand commented Apr 29, 2024

acts-project-service commented Apr 30, 2024

✅ Athena integration test results [620069e]

✅ All tests successful

perf: Try fast pow for `EigenStepper` step size scaling #3153

perf: Try fast pow for `EigenStepper` step size scaling #3153

andiwand commented Apr 28, 2024 •

edited

Loading

AJPfleger commented Apr 28, 2024 •

edited

Loading

github-actions bot commented Apr 28, 2024 •

edited

Loading

📊: Physics performance monitoring for `441aa99`

✅ Athena integration test results [`620069e`]