Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Math.Pow(x, c) where c is 2, 1, -1 or 0 #31978

Closed
wants to merge 20 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 8, 2020

Resurrects dotnet/coreclr#26552
Optimizes:

Math.Pow(x,  2) --> x*x
Math.Pow(x,  1) --> x

(same for MathF and float)

This time it's done in the importer.cpp and handles all kinds of the first argument (introduces a temp variable if needed, e.g. for GT_CALL).

Example:

static double Pow2(double x)  => Math.Pow(x, 2);
static double Pow1(double x)  => Math.Pow(x, 1);

Current codegen:

; Method Tests:Pow2(double):double
       vzeroupper
       vmovsd   xmm1, qword ptr [reloc @RWD00]
       jmp      System.Math:Pow(double,double):double


; Method Tests:Pow1(double):double
       vzeroupper
       vmovsd   xmm1, qword ptr [reloc @RWD00]
       jmp      System.Math:Pow(double,double):double

New codegen:

; Method Tests:Pow2(double):double
       vzeroupper
       vmulsd   xmm0, xmm0, xmm0
       ret


; Method Tests:Pow1(double):double
       vzeroupper
       ret  ; just return xmm0

It seems this pattern can be found in gamedev, e.g.. Xenko (a game engine): https://github.com/xenko3d/xenko/search?q=Math.Pow&unscoped_q=Math.Pow
Also the dotnet/performance benchmarks use it: https://github.com/dotnet/performance/blob/8aed638c9ee65c034fe0cca4ea2bdc3a68d2a6b5/src/benchmarks/micro/runtime/Burgers/Burgers.cs
Jitdiff for bcl:

Total bytes of delta: -40 (-0.00% of base)
    diff is an improvement.

Top file improvements (bytes):
         -40 : System.Private.CoreLib.dasm (-0.00% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 267 unchanged.

Top method improvements (bytes):
         -30 (-5.08% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:EquationOfTime(double):double
         -10 (-3.12% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:DefaultEphemerisCorrection(int):double

Top method improvements (percentages):
         -30 (-5.08% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:EquationOfTime(double):double
         -10 (-3.12% of base) : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:DefaultEphemerisCorrection(int):double

2 total methods with Code Size differences (2 improved, 0 regressed), 196451 unchanged.
Completed analysis in 15.23s

The optimization can be extended to handle more cases once some sort of fast-math mode appears in .NET Core.

@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization labels Feb 8, 2020
@benaadams
Copy link
Member

If you can do it at import with consts, would it be worth going higher? e.g. to 5 in you linked example 3-4 crops up
image

For 5 I was going to suggest smootherstep

image

However, you'd probably write it like

x * x * x * (x * (x * 6 - 15) + 10)

@EgorBo
Copy link
Member Author

EgorBo commented Feb 8, 2020

@benaadams If I understand you correctly I can't optimize other constants in "safe math" mode, e.g.
Math.Pow(x, 4) can be optimized to

        vmulsd  xmm0, xmm0, xmm0
        vmulsd  xmm0, xmm0, xmm0

(a single xmm0 register!)

but it might return a slightly different value (and violate the ieee754 spec)
see https://godbolt.org/z/R78Ev-

@stephentoub
Copy link
Member

cc: @tannergooding

@EgorBo
Copy link
Member Author

EgorBo commented Feb 9, 2020

CI failures are unrelated (#31985)

@carlossanlop
Copy link
Member

What about:
x^0 = 1

@EgorBo
Copy link
Member Author

EgorBo commented Mar 9, 2020

What about:
x^0 = 1

Can be added I guess but should be careful with side-effects, I wanted to optimize Pow(x,2) mainly since it's quite popular.

@tannergooding
Copy link
Member

For reference, the IEEE spec defines the following behavior for pow:

pow (x, ±0) is 1 if x is not a signaling NaN
pow (±0, y) is ±∞ and signals the divideByZero exception for y an odd integer < 0
pow (±0, −∞) is +∞ with no exception
pow (±0, +∞) is +0 with no exception
pow (±0, y) is ±0 for finite y > 0 an odd integer
pow (−1, ±∞) is 1 with no exception
pow (+1, y) is 1 for any y (even a quiet NaN)
pow (x, +∞) is +0 for −1 < x < 1
pow (x, +∞) is +∞ for x < −1 or for 1 < x (including ±∞)
pow (x, −∞) is +∞ for −1 < x < 1
pow (x, −∞) is +0 for x < −1 or for 1 < x (including ±∞)
pow (+∞, y) is +0 for a number y < 0
pow (+∞, y) is +∞ for a number y > 0
pow (−∞, y) is −0 for finite y < 0 an odd integer
pow (−∞, y) is −∞ for finite y > 0 an odd integer
pow (−∞, y) is +0 for finite y < 0 and not an odd integer
pow (−∞, y) is +∞ for finite y > 0 and not an odd integer
pow (±0, y) is +∞ and signals the divideByZero exception for finite y < 0 and not an odd integer
pow(±0, y) is +0 for finite y > 0 and not an odd integer
pow(x, y) signals the invalid operation exception for finite x < 0 and finite non-integer y.

A couple of the conditions aren't valid because we don't support signalling NaN nor do we support floating-point exceptions.

The C Language Standard also matches this behavior in Annex F - IEC 60559 floating-point arithemtic and I believe .NET Core is also matching this behavior and has tests validating it for these special inputs.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 9, 2020

For reference, the IEEE spec defines the following behavior for pow:

pow (x, ±0) is 1 if x is not a signaling NaN
pow (±0, y) is ±∞ and signals the divideByZero exception for y an odd integer < 0
pow (±0, −∞) is +∞ with no exception
pow (±0, +∞) is +0 with no exception
pow (±0, y) is ±0 for finite y > 0 an odd integer
pow (−1, ±∞) is 1 with no exception
pow (+1, y) is 1 for any y (even a quiet NaN)
pow (x, +∞) is +0 for −1 < x < 1
pow (x, +∞) is +∞ for x < −1 or for 1 < x (including ±∞)
pow (x, −∞) is +∞ for −1 < x < 1
pow (x, −∞) is +0 for x < −1 or for 1 < x (including ±∞)
pow (+∞, y) is +0 for a number y < 0
pow (+∞, y) is +∞ for a number y > 0
pow (−∞, y) is −0 for finite y < 0 an odd integer
pow (−∞, y) is −∞ for finite y > 0 an odd integer
pow (−∞, y) is +0 for finite y < 0 and not an odd integer
pow (−∞, y) is +∞ for finite y > 0 and not an odd integer
pow (±0, y) is +∞ and signals the divideByZero exception for finite y < 0 and not an odd integer
pow(±0, y) is +0 for finite y > 0 and not an odd integer
pow(x, y) signals the invalid operation exception for finite x < 0 and finite non-integer y.

A couple of the conditions aren't valid because we don't support signalling NaN nor do we support floating-point exceptions.

The C Language Standard also matches this behavior in Annex F - IEC 60559 floating-point arithemtic and I believe .NET Core is also matching this behavior and has tests validating it for these special inputs.

So I guess we better skip pow(x, 0) -> 1 case

@tannergooding
Copy link
Member

No, that is fine to optimize. The point of my comment is that we don't support SNaN and it is treated identically to the QNaN case.
So, pow (x, ±0) returns 1 for all inputs in .NET Core.
If we were to ever start supporting SNaN or floating-point exceptions in the future, that would need to change; but there is a lot more work that would need to be done to support that.

@EgorBo
Copy link
Member Author

EgorBo commented Mar 17, 2020

@tannergooding any idea why
Math.Pow(x, -1) != 1/x for x = double.MinValue on arm64? 🙁

on arm64 Math.Pow(double.MinValue, -1) (without the opt) returns just -0.0 (bits: 0x8000000000000000). However, 1/double.MinValue returns -5.562684646268003E-309 (0x8004000000000000).

Should I remove the pow(x, -1) optimization?

@tannergooding
Copy link
Member

tannergooding commented Mar 17, 2020

any idea why

The result is subnormal (exponent is 0). Depending on the platform (such as ARM32) or hardware configuration (x86, x64, ARM64) subnormal values may be normalized to zero.
Likewise, the implementation (https://github.com/ARM-software/optimized-routines/blob/master/math/pow.c) may end up not special casing the handling or it may have a small bug in this area (I've not investigated to determine which).

@EgorBo
Copy link
Member Author

EgorBo commented Mar 17, 2020

any idea why

The result is subnormal (exponent is 0). Depending on the platform (such as ARM32) or hardware configuration (x86, x64, ARM64) subnormal values may be normalized to zero.
Likewise, the implementation (https://github.com/ARM-software/optimized-routines/blob/master/math/pow.c) may end up not special casing the handling or it may have a small bug in this area (I've not investigated to determine which).

Thanks for explanation, so should I give up on (x, -1) opt or workaround this case for pal_pow ? Or just ignore this corner case?

@tannergooding
Copy link
Member

Thanks for explanation, so should I give up on (x, -1) opt or workaround this case for pal_pow ? Or just ignore this corner case?

You'll be likely to hit the same types of issues with pow(x, 2) if x is subnormal.

However, 1/double.MinValue returns -5.562684646268003E-309 (0x8004000000000000)

It would be good to make sure this isn't C# or the JIT doing constant folding on x / y and to check what the result is for 1 / x where x isn't a constant.

@EgorBo
Copy link
Member Author

EgorBo commented Apr 20, 2020

Will back to it later (to keep amount of active PRs smaller )

@EgorBo EgorBo closed this Apr 20, 2020
@EgorBo EgorBo reopened this Oct 24, 2020
@EgorBo
Copy link
Member Author

EgorBo commented Oct 24, 2020

🤔 hm... looks like I have to do this optimization later since LICM is not fgMakeMultiUse friendly, e.g.:

for (int i = 0; i < 1000; i++)
{
    Console.WriteLine(MathF.Pow(x + 2, 2));
}

Without this PR optimization, this Pow() is hoisted.

@AndyAyersMS
Copy link
Member

Right, it can't hoist assignments (see #35735 for example).

@EgorBo EgorBo closed this Oct 27, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants