Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Float64^Float64 #42271

Merged
merged 14 commits into from
Oct 12, 2021
Merged

Conversation

oscardssmith
Copy link
Member

I've finally managed to get an accurate version of x^y working for Float64 inputs. I think it's still missing some NaN checks and similar, but the core hard part is now done. Massive thanks to @chriselrod for hanging out with me for the 3.5 hours it took to debug the compensated arithmetic in the extended precision log (and giving me the SLEEFPirates reference implementation). Initial benchmarks:
Old:

@btime ^($(Ref(3.14))[], $(Ref((2.7))[]))
  66.043 ns (0 allocations: 0 bytes)

New:

@btime ^($(Ref(3.14))[], $(Ref((2.7))[]))
  24.237 ns (0 allocations: 0 bytes)

Theoretically, this should be accurate to 1 ULP, but I haven't done extensive testing yet.

@oscardssmith oscardssmith added performance Must go faster maths Mathematical functions labels Sep 16, 2021
@chriselrod
Copy link
Contributor

It'd be great to get LICM on our Julia implementations of "pure" functions like ^ and exp, it makes a huge difference in examples like this.
Because ^ here is using @llvm.pow, LLVM will licm it. It won't anymore once we switch to a Julia implementation.

@oscardssmith
Copy link
Member Author

The other thing that would be really nice is if the compiler was smart enough to precompute _log_ext(x) if a user writes something like 3.1^[4.0,6.0,1.0]

@chriselrod
Copy link
Contributor

How many people would complain if we @inline everything, or at least the _log_ext(x)?

@oscardssmith
Copy link
Member Author

We can't inline the whole thing, but we could inline _log_ext if we don't inline pow.

return ans_hi, ans_lo
end

# Log implimentation that returns 2 numbers which sum to give true value with about 68 bits of precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Log implimentation that returns 2 numbers which sum to give true value with about 68 bits of precision
# Log implementation that returns 2 numbers which sum to give true value with about 68 bits of precision

I don't really understand "which sum to give true value". Also, is SleefPirates.jl the original source of this code or was it adapted from somewhere else originally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean by this is that it outputs yhi,ylo such that log(big(x))-yh-ylo <2^-68, but the output is not normalized (ylo !< eps(yhi)). Basically for true double-double implementations you would typically guarantee that all the bits in yhi are correct, but I'm not doing that because it's not necessary for pow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriselrod where did the SLEEFPirates code come from again?

Copy link
Contributor

@chriselrod chriselrod Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that, does this need to include some sort of license?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably?
The SLEEF.jl license mentions it's a port of SLEEF and includes a link, but has a different license and doesn't include a copy of SLEEF's.

I don't care about SLEEFPirates/its license, so probably just SLEEF.jl's is fine.

@chriselrod
Copy link
Contributor

chriselrod commented Sep 26, 2021

We can't inline the whole thing, but we could inline _log_ext if we don't inline pow.

Actually, given the addition of callsite inlining, I think these functions should take the approach of not being inlined, but having all code within them inlined so that someone can do @inline a ^ b to get a fully inlined function.
Then it's also worth testing to ensure that the implementations are in fact SIMD-able, whenever the algorithm allows/isn't too branchy.

mauro3 referenced this pull request in pohlan/SheetModel.jl Oct 1, 2021
@oscardssmith
Copy link
Member Author

After a long period, this is finally ready to go! My plan is to merge this and #40620 after #42031 is merged so that we don't have a regression for integer powers.

@oscardssmith
Copy link
Member Author

Merging tomorrow Sans objection.

@barucden
Copy link
Contributor

barucden commented Oct 6, 2021

I am sorry -- is it intentional to remove the rem implementation? It seems that this PR, besides improving ^, basically reverts #42380.

@KristofferC KristofferC added the needs pkgeval Tests for all registered packages should be run with this change label Oct 6, 2021
@oscardssmith
Copy link
Member Author

Good catch. I got sloppy with the rebase.

@oscardssmith
Copy link
Member Author

@KristofferC can you run PackageEval on this?

@KristofferC
Copy link
Member

Sure, but I think you can do it too.

@nanosoldier runtests(ALL, vs = ":master")

@oscardssmith
Copy link
Member Author

Oh, didn't realize I could. Good to know for the future.

@KristofferC
Copy link
Member

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@oscardssmith
Copy link
Member Author

TLDR is there are a few packages that break, but only ones that do are testing with explicit floating point values, and are off by a few ULPs (and they are testing more complicated functions), so we appear to be good to merge.

@oscardssmith
Copy link
Member Author

Merging on Tuesday if there aren't objections.

@oscardssmith oscardssmith merged commit 1389c2f into JuliaLang:master Oct 12, 2021
@oscardssmith oscardssmith deleted the native-pow64 branch October 12, 2021 16:34
@Keno
Copy link
Member

Keno commented Nov 5, 2021

This blew up compile times in MTK. cc @JeffBezanson

@@ -947,11 +947,15 @@ end
@inline function ^(x::Float64, y::Float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @inline seems potentially problematic now that this function is no longer small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions needs pkgeval Tests for all registered packages should be run with this change performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants