Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Try fast pow for EigenStepper step size scaling #3153

Merged
merged 13 commits into from
Apr 29, 2024

Conversation

andiwand
Copy link
Contributor

@andiwand andiwand commented Apr 28, 2024

Currently the step scaling calculation has a big impact on the EigenStepper performance. In this PR the pow(x, 0.25) is approximated with fastPow (similar to fast inverse square root) which relies on the bit representation of the floating point number.

The assumption is that we do not really care about the precision of this value but rather that it gives a good approximation for the step size scaling.

Edit: After measuring the performance it seems like there is no measurable improvement. I made this approximation compile time optional for now. See this #3153 (comment) for more details

image

References

@andiwand andiwand added this to the next milestone Apr 28, 2024
@andiwand andiwand changed the title perf: Fast pow for EigenStepper step scaling perf: Fast pow for EigenStepper step size scaling Apr 28, 2024
@github-actions github-actions bot added the Component - Core Affects the Core module label Apr 28, 2024
@AJPfleger
Copy link
Contributor

AJPfleger commented Apr 28, 2024

Since we are clamping anyway, we could look at the representation of double x and the clamping limits 0.25 and 4.

so we want to end up with a number in the range $(1 \cdot 2^{-2},1 \cdot 2^{2})$ for this our initial x needs to be in $(1 \cdot 2^{-2 \cdot 4},1 \cdot 2^{2 \cdot 4})$

we could then construct the function using std::frexp, maybe like this:

int exponent;
double significant = std::frexp(x, &i);

// -7 instead of -8, because significant is in the interval [0.5, 1)
if ( i < -7) {
    return 0.25;
} else if ( i > 8) {
    return 4;
} else {
    int remainder = exponent % 4;
    // TODO extra-step on `significant` using remainder and pow-estimate
    return ldexp(significant, exponent / 4);
}

Even with the other technique, it might be useful to do the clamping before.

@andiwand andiwand marked this pull request as draft April 28, 2024 17:41
@andiwand
Copy link
Contributor Author

Apparently std::sqrt(std::sqrt(x)) is pretty fast.

Benchmarking a=247.842, b=0.25...
- void: 20000 runs of 10000 iteration(s), 119.2ms total, 5.8750+/-0.0005µs per run, 0.588+/-0.005ns per iteration
- std::pow: 20000 runs of 10000 iteration(s), 1134.6ms total, 56.2090+/-0.0032µs per run, 5.621+/-0.032ns per iteration
- std::exp: 20000 runs of 10000 iteration(s), 901.2ms total, 44.8340+/-0.0016µs per run, 4.483+/-0.016ns per iteration
- std::sqrt: 20000 runs of 10000 iteration(s), 197.8ms total, 9.8750+/-0.0005µs per run, 0.988+/-0.005ns per iteration
- fastPow: 20000 runs of 10000 iteration(s), 147.6ms total, 7.4590+/-0.0011µs per run, 0.746+/-0.011ns per iteration
- fastPowMorePrecise: 20000 runs of 10000 iteration(s), 209.8ms total, 10.5830+/-0.0005µs per run, 1.058+/-0.005ns per iteration
- fastPowAnother: 20000 runs of 10000 iteration(s), 178.2ms total, 9.0000+/-0.0080µs per run, 0.900+/-0.080ns per iteration

fastPow is still faster but it does not have an impact on the overall stepper performance.

I suspect the reason for higher performance when removing the second scaling is that our current scaling strategy increases and decreases the step size periodically which results in a lot of attempted steps and possibly mispredicted branches.

I will make the function selection configurable for now.

@andiwand andiwand marked this pull request as ready for review April 29, 2024 12:33
Tests/Benchmarks/QuickMathBenchmark.cpp Outdated Show resolved Hide resolved
Tests/Benchmarks/QuickMathBenchmark.cpp Outdated Show resolved Hide resolved
Tests/UnitTests/Core/Utilities/QuickMathTests.cpp Outdated Show resolved Hide resolved
Core/include/Acts/Propagator/EigenStepper.ipp Show resolved Hide resolved
@andiwand andiwand changed the title perf: Fast pow for EigenStepper step size scaling perf: Try fast pow for EigenStepper step size scaling Apr 29, 2024
@andiwand andiwand requested a review from AJPfleger April 29, 2024 15:19
@kodiakhq kodiakhq bot merged commit 620069e into acts-project:main Apr 29, 2024
51 checks passed
@andiwand andiwand deleted the perf-eigenstepper-fast-pow branch April 29, 2024 22:25
EleniXoch pushed a commit to EleniXoch/acts that referenced this pull request May 6, 2024
…#3153)

Currently the step scaling calculation has a big impact on the `EigenStepper` performance. In this PR the `pow(x, 0.25)` is approximated with `fastPow` (similar to fast inverse square root) which relies on the bit representation of the floating point number.

The assumption is that we do not really care about the precision of this value but rather that it gives a good approximation for the step size scaling.

Edit: After measuring the performance it seems like there is no measurable improvement. I made this approximation compile time optional for now. See this acts-project#3153 (comment) for more details

<img width="577" alt="image" src="https://github.com/acts-project/acts/assets/487211/d613caeb-991f-4a89-98fc-bc051be13b7b">

References
- https://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp
@andiwand andiwand modified the milestones: next, v35.0.0 May 17, 2024
asalzburger pushed a commit to asalzburger/acts that referenced this pull request May 21, 2024
…#3153)

Currently the step scaling calculation has a big impact on the `EigenStepper` performance. In this PR the `pow(x, 0.25)` is approximated with `fastPow` (similar to fast inverse square root) which relies on the bit representation of the floating point number.

The assumption is that we do not really care about the precision of this value but rather that it gives a good approximation for the step size scaling.

Edit: After measuring the performance it seems like there is no measurable improvement. I made this approximation compile time optional for now. See this acts-project#3153 (comment) for more details

<img width="577" alt="image" src="https://github.com/acts-project/acts/assets/487211/d613caeb-991f-4a89-98fc-bc051be13b7b">

References
- https://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Core Affects the Core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants