Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up Float16 conversions a bit #29891

Merged
merged 1 commit into from
Nov 6, 2018
Merged

speed up Float16 conversions a bit #29891

merged 1 commit into from
Nov 6, 2018

Conversation

JeffBezanson
Copy link
Member

For me this speeds up the first benchmark in #29889 by about 20%, by removing all the unnecessary error checks. Hardly game-changing, but might as well.

@JeffBezanson JeffBezanson added the performance Must go faster label Nov 1, 2018
@JeffBezanson
Copy link
Member Author

Before:

julia> @time value_16 .*= mult;
  2.315576 seconds (6 allocations: 208 bytes)

After:

julia> @time value_16 .*= mult;
  1.801816 seconds (6 allocations: 208 bytes)

@JeffBezanson
Copy link
Member Author

The tables are quite small, and can be made immutable and inlined, leading to

julia> @time value_16 .*= mult;
  1.206085 seconds (6 allocations: 208 bytes)

Almost 2x speedup. Now we're cooking!

@PeterJacko
Copy link

The tables are quite small, and can be made immutable and inlined, leading to

julia> @time value_16 .*= mult;
  1.206085 seconds (6 allocations: 208 bytes)

Almost 2x speedup. Now we're cooking!

Many thanks! Are you achieving the same speed-up also for mult = 1.0 rather then mult = 1? (See the second benchmark in #29889.) It seems that it might make sense to convert an integer multiplier to float to perform the multiplication.

@StefanKarpinski
Copy link
Member

It seems that it might make sense to convert an integer multiplier to float to perform the multiplication.

This has to happen internally anyway—there's no way to directly multiply an int and a float (except for special cases with constants like 2x which can get implemented as x+x instead of a multiplication). The thing to avoid is converting the integer to a Float16 first via promotion and then back to Float32 for the multiplication. It will be really nice when LLVM supports Float16 directly.

@JeffBezanson
Copy link
Member Author

There should also be some speedup for mult = 1.0 since it also requires Float16 conversions. But it needs fewer (which is why it's faster than the integer case), since:

The thing to avoid is converting the integer to a Float16 first via promotion and then back to Float32 for the multiplication.

Indeed we seem to be doing that. I guess we'll need a bunch of special methods to handle it better.

@JeffBezanson JeffBezanson merged commit 0d4edb3 into master Nov 6, 2018
@JeffBezanson JeffBezanson deleted the jb/float16conv branch November 6, 2018 23:15
tkf pushed a commit to tkf/julia that referenced this pull request Nov 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants