Faster Float64^Float64 #42271

oscardssmith · 2021-09-16T04:20:27Z

I've finally managed to get an accurate version of x^y working for Float64 inputs. I think it's still missing some NaN checks and similar, but the core hard part is now done. Massive thanks to @chriselrod for hanging out with me for the 3.5 hours it took to debug the compensated arithmetic in the extended precision log (and giving me the SLEEFPirates reference implementation). Initial benchmarks:
Old:

@btime ^($(Ref(3.14))[], $(Ref((2.7))[]))
  66.043 ns (0 allocations: 0 bytes)

New:

@btime ^($(Ref(3.14))[], $(Ref((2.7))[]))
  24.237 ns (0 allocations: 0 bytes)

Theoretically, this should be accurate to 1 ULP, but I haven't done extensive testing yet.

chriselrod · 2021-09-16T20:46:43Z

It'd be great to get LICM on our Julia implementations of "pure" functions like ^ and exp, it makes a huge difference in examples like this.
Because ^ here is using @llvm.pow, LLVM will licm it. It won't anymore once we switch to a Julia implementation.

oscardssmith · 2021-09-16T21:05:32Z

The other thing that would be really nice is if the compiler was smart enough to precompute _log_ext(x) if a user writes something like 3.1^[4.0,6.0,1.0]

chriselrod · 2021-09-16T21:11:37Z

How many people would complain if we @inline everything, or at least the _log_ext(x)?

oscardssmith · 2021-09-16T21:16:04Z

We can't inline the whole thing, but we could inline _log_ext if we don't inline pow.

KristofferC · 2021-09-17T15:16:31Z

base/special/log.jl

+    return ans_hi, ans_lo
+end
+
+# Log implimentation that returns 2 numbers which sum to give true value with about 68 bits of precision


Suggested change

# Log implimentation that returns 2 numbers which sum to give true value with about 68 bits of precision

# Log implementation that returns 2 numbers which sum to give true value with about 68 bits of precision

I don't really understand "which sum to give true value". Also, is SleefPirates.jl the original source of this code or was it adapted from somewhere else originally?

what I mean by this is that it outputs yhi,ylo such that log(big(x))-yh-ylo <2^-68, but the output is not normalized (ylo !< eps(yhi)). Basically for true double-double implementations you would typically guarantee that all the bits in yhi are correct, but I'm not doing that because it's not necessary for pow

@chriselrod where did the SLEEFPirates code come from again?

SLEEF: https://github.com/shibatch/sleef
@musm 's port: https://github.com/musm/SLEEF.jl
SLEEFPirates, a fork of the port: https://github.com/JuliaSIMD/SLEEFPirates.jl

Given that, does this need to include some sort of license?

Probably?
The SLEEF.jl license mentions it's a port of SLEEF and includes a link, but has a different license and doesn't include a copy of SLEEF's.

I don't care about SLEEFPirates/its license, so probably just SLEEF.jl's is fine.

chriselrod · 2021-09-26T23:50:36Z

We can't inline the whole thing, but we could inline _log_ext if we don't inline pow.

Actually, given the addition of callsite inlining, I think these functions should take the approach of not being inlined, but having all code within them inlined so that someone can do @inline a ^ b to get a fully inlined function.
Then it's also worth testing to ensure that the implementations are in fact SIMD-able, whenever the algorithm allows/isn't too branchy.

oscardssmith · 2021-10-02T00:08:41Z

After a long period, this is finally ready to go! My plan is to merge this and #40620 after #42031 is merged so that we don't have a regression for integer powers.

Co-authored-by: Kristoffer Carlsson <[email protected]>

oscardssmith · 2021-10-06T01:11:44Z

Merging tomorrow Sans objection.

barucden · 2021-10-06T07:12:46Z

I am sorry -- is it intentional to remove the rem implementation? It seems that this PR, besides improving ^, basically reverts #42380.

oscardssmith · 2021-10-06T12:55:50Z

Good catch. I got sloppy with the rebase.

oscardssmith · 2021-10-06T21:03:15Z

@KristofferC can you run PackageEval on this?

KristofferC · 2021-10-07T20:39:18Z

Sure, but I think you can do it too.

@nanosoldier runtests(ALL, vs = ":master")

oscardssmith · 2021-10-07T20:47:22Z

Oh, didn't realize I could. Good to know for the future.

KristofferC · 2021-10-08T06:40:45Z

@nanosoldier runtests(ALL, vs = ":master")

nanosoldier · 2021-10-09T00:29:53Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

oscardssmith · 2021-10-09T03:37:53Z

TLDR is there are a few packages that break, but only ones that do are testing with explicit floating point values, and are off by a few ULPs (and they are testing more complicated functions), so we appear to be good to merge.

oscardssmith · 2021-10-11T05:55:45Z

Merging on Tuesday if there aren't objections.

Keno · 2021-11-05T23:19:52Z

This blew up compile times in MTK. cc @JeffBezanson

Keno · 2021-11-05T23:20:41Z

base/math.jl

@@ -947,11 +947,15 @@ end
 @inline function ^(x::Float64, y::Float64)


This @inline seems potentially problematic now that this function is no longer small.

Co-authored-by: Kristoffer Carlsson <[email protected]>

oscardssmith added performance Must go faster maths Mathematical functions labels Sep 16, 2021

KristofferC reviewed Sep 17, 2021

View reviewed changes

stevengj mentioned this pull request Sep 25, 2021

Performance regression with non-integer powers (v1.6 and newer) #39976

Closed

mauro3 referenced this pull request in pohlan/SheetModel.jl Oct 1, 2021

h update: removed ^ operations

95ecad6

oscardssmith and others added 9 commits October 5, 2021 17:01

It's alive!

92e54fb

minor cleanup

ca788d9

less broken

8118972

Update base/special/log.jl

ba588f4

Co-authored-by: Kristoffer Carlsson <[email protected]>

fix Inf and 0

fd5a4da

fix inf/nan

88660e2

fix doctests

5041bb8

0^x

550ab42

rebase on master

0808d83

oscardssmith force-pushed the native-pow64 branch from 308b9c3 to 0808d83 Compare October 5, 2021 22:07

oscardssmith added 3 commits October 5, 2021 20:13

fix removed compat from rebase

4d7335f

fix removed compat from rebase

44f3366

minor whitespace fixes

36b4057

KristofferC added the needs pkgeval Tests for all registered packages should be run with this change label Oct 6, 2021

oscardssmith added 2 commits October 6, 2021 07:57

fix broken rebase

2661f6d

fix broken rebase

da8da62

oscardssmith mentioned this pull request Oct 8, 2021

Julia 1.7 exp(NaN16) gives wrong answer #42554

Closed

oscardssmith merged commit 1389c2f into JuliaLang:master Oct 12, 2021

oscardssmith deleted the native-pow64 branch October 12, 2021 16:34

Keno reviewed Nov 5, 2021

View reviewed changes

lbenet mentioned this pull request Jan 7, 2022

Add methods to output local Taylor polynomials for dense output PerezHz/TaylorIntegration.jl#134

Merged

lbenet mentioned this pull request Jan 15, 2022

Some tests in nightly (1.8.0-DEV) are broken PerezHz/TaylorIntegration.jl#136

Closed

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022

Faster Float64^Float64 (JuliaLang#42271)

bf1aa2b

Co-authored-by: Kristoffer Carlsson <[email protected]>

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022

Faster Float64^Float64 (JuliaLang#42271)

37556d9

Co-authored-by: Kristoffer Carlsson <[email protected]>

oscardssmith mentioned this pull request Mar 8, 2022

fix precision issue in Float64^Float64. #44529

Merged

KristofferC mentioned this pull request Apr 3, 2022

Small change in floating point exponentiation causes a test failure on Julia 1.8. MineralsCloud/EquationsOfStateOfSolids.jl#146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Float64^Float64 #42271

Faster Float64^Float64 #42271

oscardssmith commented Sep 16, 2021

chriselrod commented Sep 16, 2021

oscardssmith commented Sep 16, 2021

chriselrod commented Sep 16, 2021

oscardssmith commented Sep 16, 2021

KristofferC Sep 17, 2021

oscardssmith Sep 17, 2021

oscardssmith Sep 17, 2021

chriselrod Sep 17, 2021 •

edited

Loading

oscardssmith Sep 17, 2021

chriselrod Sep 19, 2021

chriselrod commented Sep 26, 2021 •

edited

Loading

oscardssmith commented Oct 2, 2021

oscardssmith commented Oct 6, 2021

barucden commented Oct 6, 2021

oscardssmith commented Oct 6, 2021

oscardssmith commented Oct 6, 2021

KristofferC commented Oct 7, 2021

oscardssmith commented Oct 7, 2021

KristofferC commented Oct 8, 2021

nanosoldier commented Oct 9, 2021

oscardssmith commented Oct 9, 2021

oscardssmith commented Oct 11, 2021

Keno commented Nov 5, 2021

Keno Nov 5, 2021

	# Log implimentation that returns 2 numbers which sum to give true value with about 68 bits of precision
	# Log implementation that returns 2 numbers which sum to give true value with about 68 bits of precision

		@@ -947,11 +947,15 @@ end
		@inline function ^(x::Float64, y::Float64)

Faster Float64^Float64 #42271

Faster Float64^Float64 #42271

Conversation

oscardssmith commented Sep 16, 2021

chriselrod commented Sep 16, 2021

oscardssmith commented Sep 16, 2021

chriselrod commented Sep 16, 2021

oscardssmith commented Sep 16, 2021

KristofferC Sep 17, 2021

Choose a reason for hiding this comment

oscardssmith Sep 17, 2021

Choose a reason for hiding this comment

oscardssmith Sep 17, 2021

Choose a reason for hiding this comment

chriselrod Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

oscardssmith Sep 17, 2021

Choose a reason for hiding this comment

chriselrod Sep 19, 2021

Choose a reason for hiding this comment

chriselrod commented Sep 26, 2021 • edited Loading

oscardssmith commented Oct 2, 2021

oscardssmith commented Oct 6, 2021

barucden commented Oct 6, 2021

oscardssmith commented Oct 6, 2021

oscardssmith commented Oct 6, 2021

KristofferC commented Oct 7, 2021

oscardssmith commented Oct 7, 2021

KristofferC commented Oct 8, 2021

nanosoldier commented Oct 9, 2021

oscardssmith commented Oct 9, 2021

oscardssmith commented Oct 11, 2021

Keno commented Nov 5, 2021

Keno Nov 5, 2021

Choose a reason for hiding this comment

chriselrod Sep 17, 2021 •

edited

Loading

chriselrod commented Sep 26, 2021 •

edited

Loading