Imprecise floating point operations (fast-math) #21690

mpdn · 2015-01-27T14:14:37Z

There should be a way to use imprecise floating point operations like GCC's and Clang's -ffast-math. The simplest way to do this would be to do like GCC and Clang and implement a command line flag, but I think a better way to do this would be to create a f32fast and f64fast type that would then call the fast LLVM math functions. This way you can easily mix fast and "slow" floating point operations.

I think this could be implemented as a library if LLVM assembly could be used in the asm macro.

The text was updated successfully, but these errors were encountered:

kmcallister · 2015-01-28T02:14:51Z

Inline IR was discussed on #15180. Another option is extern "llvm-intrinsic" { ... } which I vaguely think we had at some point. If we added more intrinsics to std::intrinsics would that be sufficient?

mpdn · 2015-01-28T09:40:52Z

Yeah, adding it as a function in std::intrinsics could definitely work as well.

There are a few different fast math flags, but the fast flag is probably the most important as it implies all the other flags. Adding all of them would be unreasonable if using intrinsic functions, but I don't think all of them are necessary.

bluss · 2015-08-17T14:48:41Z

This forum thread has examples of loops that llvm can vectorize well for integers, but doesn't for floats (a dot product).

kornelski · 2017-06-08T12:50:37Z

I've prototyped it using a newtype: https://gitlab.com/kornelski/ffast-math (https://play.rust-lang.org/?gist=d516771d1d002f740cc9bf6eb5cacdf0&version=nightly&backtrace=0)

It works in simple cases, but the newtype solution is insufficient:

it doesn't work with floating-point literals. That's a huge pain when converting programs to this newtype.
it doesn't work with the as operator, and a trait to make that possible has been rejected before.
the wrapper type and extra level of indirection affects inlining of code using it. I've found some large functions where the newtype was slower than regular float, but not because of float math, but because other structs and calls around it weren't as optimized. I wasn't able to reproduce it in simple cases, so I'm not sure what exactly is going on.

So I'm very keen on seeing it supported natively in Rust.

bluss · 2017-06-08T17:57:39Z

@pornel The issue #24963 had a test case where a newtype impacted vectorization. That example was fixed (great!), sounds like the bug is probably still visible in similar code.

pedrocr · 2017-06-08T19:22:50Z

I've tried -ffast-math in my C vs Rust benchmark of some graphics code:

https://github.com/pedrocr/rustc-math-bench

In the C code it's a ~20% improvement in clang but no benefit with GCC. In both cases it returns a wrong result and the math is extremely simple (multiplying a vector by a matrix). According to this:

https://stackoverflow.com/questions/38978951/can-ffast-math-be-safely-used-on-a-typical-project#38981307

-ffast-math is generally too unsafe for normal usage as it implies some strange things (e.g., NaN checks always return false). So it seems sensible to have a way to opt-in only to the more benign ones.

kornelski · 2017-06-08T20:46:06Z

@pedrocr Your benchmark has a loss of precision in sum regardless of fast-math mode. Both slow and fast give wrong result compared to summation using double sum.

With double for the sum and you'll get correct result, even with -ffast-math.

You get significantly different sum with float sum, because fast-math gives you a small systemic rounding error, which accumulates over 100 million additions.

All values from matrix multiplication are the same to at least 6 digits (I've diffed printf("%f", out[i]) of all values and they're all the same).

pedrocr · 2017-06-08T22:28:38Z

@pornel thanks, fixed here:

pedrocr/rustc-math-bench@8169fa3

The benchmark results are fine though, the sum is only used as a checksum. Here are the averages of three runs in ms/megapixel:

Compiler	-O3 -march=native	-O3 -march=native -ffast-math
clang 3.8.0-2ubuntu4	6,91	5,40 (-22%)
gcc 5.4.0-6ubuntu1~16.04.4	5,71	5,85 (+2%)

So as I mentioned before clang/llvm gets a good benefit from ffast-math but not gcc. I'd say making sure things like is_normal() still work is very important but at least on llvm it helps to be able to enable ffast-math.

pedrocr · 2017-06-08T22:35:59Z

I've suggested it would make sense to expose -ffast-math using the target-feature mechanisms:

https://internals.rust-lang.org/t/pre-rfc-stabilization-of-target-feature/5176/23

kornelski · 2017-06-08T23:18:26Z

Rust has fast math intrinsics, so the fast math behavior could be limited to a specific type or selected functions, without forcing the whole program into it.

pedrocr · 2017-06-09T18:20:36Z

A usable solution for my use cases would probably be to have the vector types in the simd crate be the types that allow the opt-in to ffast-math. That way there's only one type I need to conciously convert the code to for speedups. But for the general solution of in normal code having to swap types seems cumbersome. But maybe just doing return val as f32 when val is an f32fast type isn't that bad.

pedrocr · 2017-08-10T15:28:49Z

Created a pre-RFC discussion on internals to try and get a discussion on the best way to do this:

https://internals.rust-lang.org/t/pre-rfc-whats-the-best-way-to-implement-ffast-math/5740

robsmith11 · 2019-01-20T13:19:37Z

Is there a current recommended approach to using fast-math optimizations in rust nightly?

jeffvandyke · 2019-10-24T15:57:54Z

If it helps, a good benchmark comparison article between C++ and Rust floating point optimizations (link) inside loops was written recently (Oct 19), with a good Hacker News discussion exploring this concept.

Personally, I think the key is that without specifying any (EDIT: floating-point specific) flags (and after using iterators), by default clang and gcc do more optimizations on float math than Rust currently does.

(EDIT: It seems that -fvectorize -Ofast was specified for clang to get gcc-comparable results, see proceeding comment)

An important key for any discussion on opmitized float math should keep this in mind: vectorization isn't always less precise - a commenter pointed out that a vectorized floating point sum is actually more accurate than the un-vectorized version. Also see Stack Overflow: https://stackoverflow.com/a/7455442

I'm curious what criteria for vectorization clang (or gcc) uses for figuring out floating point optimization. I'm not enough of an expert in these areas to know specifics though. I'm also not sure what precision guarantees Rust makes for floating point math.

pedrocr · 2019-10-24T16:03:27Z

Personally, I think the key is that without specifying any flags (and after using iterators), by default clang and gcc do more optimizations on float math than Rust currently does.

That's not the case in the article. The clang compilation was using -Ofast which apparently enables -ffast-math.

StHagel · 2024-06-14T09:17:39Z

While I do think that adding an option to enable fast-math in Rust is definitely desireable, I don't like the idea of making a new type for it however.

I would rather make it an optional compiler flag, that is not set by default in --release. This way I can run my existing code with fast-math enabled if I want to and not use fast-math if I don't want to. Adding a new type would require me to either change all f64s to f64fast in my entire codebase or go through every function and think about whether it makes sense to use f64fast here or not and add var as f64 and var as f64fast all over the place.

NobodyXu · 2024-06-14T09:28:52Z

Putting it in profile, allowing each crate to set it, and allowing the binary crate to override it per-crate seems to make sense.

You could then enable it for your library crate if you know it is safe , and for binary crate they can disable it if it turns out doesn't work, or enable it if they know what they are doing.

RalfJung · 2024-06-14T09:47:47Z

Making it a compile flag that applies to other crates sounds like a terrible idea. When you download a crate from the internet, you can't know whether it was written in a way that is compatible with fast-math semantics. It is very important not to apply fast-math semantics to code that assumes IEEE semantics.

We could have a crate-level attribute meaning "all floating-point ops in this crate are fast-math", but under no circumstances should you be able to force fast-math on other people's code. That would ultimately even undermine Rust's safety promise.

StHagel · 2024-06-14T09:55:36Z

We could have a crate-level attribute meaning "all floating-point ops in this crate are fast-math", but under no circumstances should you be able to force fast-math on other people's code.

That sounds like a good path to go on in my eyes. Being able to set an attribute in the Cargo.toml which basically means "This crate is fast-math-safe". Compiling your code with fast-math on would then check every dependency whether it is fast-math-safe or not and compile it accordingly.

usamoi · 2024-06-14T10:15:41Z

I use core::intrinsics::f*_fast or core::intrinsics::f*_algebraic to hint compiler for auto vectorization and it totally works. The only thing that I care about is these functions are gated with core_intrinsics, which seems quite awkward.

calder · 2025-02-02T20:42:14Z

What's preventing us from stabilizing core::intrinsics::f*_algebraic today? Those are probably sufficient for 90% of cases where you're optimizing an inner loop and are fine with individual ops being reassociated.

Here's a simple example where stable Rust is 8x slower than C++ (dot product of two 100,000 element f32 vectors) because there's no way of telling the compiler that it's OK to reorder ops to enable vectorization:

C++ with #pragma clang fp reassociate(on): 10us
Rust nightly with core::intrinsics::f[add|mul]_fast: 10us
Rust stable: 84us

https://github.com/calder/dot-bench

EDIT: Confirmed this would work great and close the performance gap with nightly: calder@c3c7fab

StHagel · 2025-02-03T11:23:54Z

EDIT: Confirmed this would work great and close the performance gap with nightly: calder@c3c7fab

That looks awesome, thanks for putting in the work!
How would one use algebraic operations in practice then? Only with a.add_algebraic(b) or would there be an option to overwrite the usual operators (+-*/%) with algebraic ones, so one doesn't have to rewrite lots of code?

calder · 2025-02-04T02:30:59Z

Only a.algebraic_*(b) functions for now to unblock 90% of use cases with as little controversy as possible (this issue has been open for 10 years and there's still no way to tell stable Rust to allow reordering) but other people / library authors can follow up with more.

RReverser · 2025-02-17T02:48:50Z

We could have a crate-level attribute meaning "all floating-point ops in this crate are fast-math", but under no circumstances should you be able to force fast-math on other people's code. That would ultimately even undermine Rust's safety promise.

It sounds like an even better fit could be a custom target_feature, although not sure if we are allowed to extend it with what is not, in fact, a "target" CPU feature.

The syntax seems like a very good fit though:

#[target_feature(enable = "fast_math")]
unsafe fn i_promise_its_ok_to_reorder_math_in_me() { ... }

jgarvin · 2025-02-17T03:45:57Z

Making it a compile flag that applies to other crates sounds like a terrible idea. When you download a crate from the internet, you can't know whether it was written in a way that is compatible with fast-math semantics. It is very important not to apply fast-math semantics to code that assumes IEEE semantics.

Most float code doesn't assume or even consider IEEE semantics though. Most users of floating point don't know anything about the rounding guarantees or numerical stability or how the encoding works. Unfortunately likely a lot of code that works just fine with -ffast-math or equivalent and gets a significant performance benefit. I don't know the history but I suspect this is one of the reasons the flag exists in GCC. I have definitely seen it make autovec work when it otherwise didn't.

We could have a crate-level attribute meaning "all floating-point ops in this crate are fast-math", but under no circumstances should you be able to force fast-math on other people's code. That would ultimately even undermine Rust's safety promise.

I think it would be consistent with the rest of Rust if you were allowed to but it required using unsafe in Cargo.toml and when invoking an associated rustc flag. I don't think there's a concept of unsafe compile options currently but I can think of others that would be interesting like disabling bounds checks (even if you intend to use them, useful for measuring to make the case it's not a big perf impact).

RalfJung · 2025-02-17T07:42:01Z

No, a target feature is definitely wrong as that can be set via -Ctarget-feature globally, but you must never be able to apply this flag to other people's code without them opting-in. This is definitely true for the UB-inducing -ffast-math; Rust will not compromise its memory safety over floating-point performance concerns. "Most code would still be sound" is not good enough. The chances of Rust adopting an approach that can break soundness are 0, no matter how many flags GCC has (which are equally unsound but being C, that's fine for them).

A blanket global explicitly unsafe flag is also not going to fly; what would the safety comment for that even look like? "I audited all the code in my entire dependency tree"? The point of unsafe is to make things locally auditable; this flag cannot achieve that.

IMO the same goes even for the "just produces wrong results" variant (not sure if that has a standard name, it corresponds to the operations described in rust-lang/libs-team#532). We don't just go and alter the semantics of other people's code. In particular, this would be a breaking change as we'd have to take back what we say in https://doc.rust-lang.org/nightly/std/primitive.f32.html.

So, I would recommend the discussion to focus on ways to opt-in to these semantics locally, in a way controlled by the author of the respective code. Everything else is either an outright no-go (if it can break soundness) or at least highly unlikely to lead anywhere (if it breaks stable semantic promises).

RReverser · 2025-02-17T11:18:17Z

No, a target feature is definitely wrong as that can be set via -Ctarget-feature globally, but you must never be able to apply this flag to other people's code without them opting-in.

Ah, true enough, forgot about the global switch. To me, the primary appeal of this syntax is the locality - ability to turn it on on per-function basis, and that sounds like something we agree on. I'm equally happy if it was a different attribute that can still be applied on per-function granularity.

kornelski · 2025-02-17T13:55:24Z

Per-function attributes have a hard to resolve issue of how "deep" they apply. If the attributes applied to all code called or inlined in the function body, then they could affect code of other crates, which is a no-no. If the attributes applied only in a shallow way, not through method calls, then it could be frustrating that methods on f32 won't be affected (since they're in core), and there would be visible difference between f32 + f32 lowered to a dedicated MIR instruction and a call to f32::add. Attributes on closures aren't fully supported. Specifying how to control the exact scope of attributes may turn out to be a big task.

Previously I've proposed having a fast float type built-in into Rust, like r32, but there are many different guarantees/optimizations that users could theoretically want (finite, non-NaN, no subnormals, rounding), so this proposal died in bikeshedding.

Therefore, stabilization of fast-math intrinsics seems like the most realistic path forward: #21690 (comment)

RReverser · 2025-02-17T15:26:56Z

If the attributes applied to all code called or inlined in the function body, then they could affect code of other crates, which is a no-no

I'm not sure I agree with that. I agree with the sentiment above, that global toggle affecting arbitrary crates is a no-no, but I'd say anything invoked from a function marked with such attribute is a fair game - same as, when you mark a function with target_feature, the autovectorizer can roam free over both direct function body and any code indirectly inlined in it. (this is also the reason why, like with target_feature, I think this should require unsafe)

If anything, not being able to leverage same optimisations for std and existing third-party APIs that work on floats, despite an explicit attribute on my function, would make this near-useless except for very niche microoptimisations. At that point, if I have to do those microoptimisations manually anyway, I might as well reorder code and use 3rd party crates myself instead of using new intrinsics.

It's specifically the "automatic optimisation" that makes fast-math so valuable.

hanna-kruppe · 2025-02-17T16:01:42Z

Target feature is only unsafe because you have to make sure the CPU you’re running on will recognize the instructions that the compiler may emit in that function. It doesn’t otherwise affect the semantics of the code and especially not of any code that happens to get inlined into it. If it did, we couldn’t have it apply to inlined code because it would face the same problem as fast math flags: the caller is generally not able to judge whether the callee could handle the change in semantics gracefully or not. Yes, this makes it very hard to use any third party code in loops that you want to see auto-vectorized. The proper solution is to get the authors of the third party loop to opt into it (this is already possible today), or to declare that they’re fine with either semantics (needs language design).

RalfJung · 2025-02-17T16:09:48Z

The autovectorizer does not change program semantics. So these situations are not comparable. Fast-math is not an "optimization", it is askinh for different program behavior. Exposing (a version of) the intrinsics is tracked at <#136469>. Thanks for taking the initiative on that! It is not clear to me if this long-winded thread still serves any purpose. There is more design space to explore here, but keeping open an issue with 100 comments (many outdated) is not useful for that purpose. Is there anything reasonably concrete / actionable we still want to track here?

tgross35 · 2025-02-17T20:20:18Z

Agreeing with Ralf, I am going to close this. The reassociation part of fast math is unstably available at #136469. If anybody still has a need for FTZ/DAZ variants or unsafe variants that poison NaN/inf, feel free to open a fresh issue or propose directly as an ACP.

jgarvin · 2025-02-17T22:28:19Z

Rust will not compromise its memory safety over floating-point performance concerns. "Most code would still be sound" is not good enough.

Rust compromises its memory safety all the time when you use an unsafe, that's why I only suggested it being behind a use of unsafe.

A blanket global explicitly unsafe flag is also not going to fly; what would the safety comment for that even look like? "I audited all the code in my entire dependency tree"? The point of unsafe is to make things locally auditable; this flag cannot achieve that.

No, you would attach it to specific dependencies. It's a similar procedure to any time you have optional unsafe code. When you suspect unsafety is causing a problem you turn off the unsafe flag everywhere and if that fixes the issue you begin binary searching your dependencies.

Everything else is either an outright no-go (if it can break soundness) or at least highly unlikely to lead anywhere (if it breaks stable semantic promises).

There is no promise being broken unless a library author has specifically advertised their library as being safe to use with it (and some probably would).

Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps

RalfJung · 2025-02-18T07:10:46Z

No, you would attach it to specific dependencies. It's a similar procedure to any time you have optional unsafe code.

No, this is not even remotely similar to anything we have currently. Nothing in Rust currently lets you unilaterally alter the behavior of code in your dependencies. Even if you did audit that entire crate, it's totally permitted under semver for a minor version bump of the crate to start using floating-point operations in a new way, voiding your audit.

Rust is generally carefully designed so that the only thing you ever can or have to know about another crate is its public API surface and the associated documentation. This would completely undermine that.

Everything about this flag fundamentally clashes with the idea of robust, compositional system design. It was a terrible mistake to ever add it to C compilers, and Rust should not repeat that mistake. Rust should instead explore alternative, less fragile ways of exposing those semantics. (That's just my personal opinion, I am not speaking for any team here. But I consider it quite likely that many t-lang and t-opsem people will agree with this sentiment.) A first step has been done and is tracked in #136469. Maybe a second step is a per-function / per-module / per-crate attribute, though that will have to explain convincingly which issue is solved by this that the algebraic operations do not solve yet. A sort of pre-RFC on IRLO would likely be a good starting point here; please do not try to design something from scratch in this issue.

Also please stop rehashing points that have already been made many times above. I get that not everyone agrees with the Rust design philosophy, but it's not going to change without demonstrating that all conceivable alternatives have been explored and they are all clearly inferior.

Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps try-job: x86_64-gnu-nopt

Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps ~~try-job: x86_64-gnu-nopt~~ try-job: x86_64-gnu-aux

kmcallister added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jan 28, 2015

huonw added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jan 28, 2015

emberian self-assigned this Mar 25, 2015

bluss changed the title ~~Imprecise floating point operations~~ Imprecise floating point operations (fast-math) Dec 20, 2015

This was referenced Dec 20, 2015

2x speed difference vs hand-unrolled dimforge/nalgebra#135

Open

Unroll sum and / or dot manually so that it autovectorizes rustgd/cgmath#280

Closed

emberian removed their assignment Jan 5, 2016

bluss mentioned this issue Mar 15, 2016

Add intrinsics for float arithmetic with fast flag enabled #32256

Merged

Mark-Simulacrum added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 22, 2017

pedrocr mentioned this issue Jul 25, 2017

Would it make sense to enable ffast-math for simd types? hsivonen/simd#19

Open

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 26, 2017

upsuper mentioned this issue Jun 16, 2018

Add rsqrt method to Float trait rust-num/num-traits#1

Open

repi mentioned this issue Apr 1, 2019

Fastmath support EmbarkStudios/rust-ecosystem#2

Closed

1 task

Centril added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Oct 24, 2019

shssoichiro mentioned this issue Mar 1, 2024

Implement abstraction over mul_add rust-av/yuvxyb#22

Merged

tgross35 closed this as completed Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imprecise floating point operations (fast-math) #21690

Imprecise floating point operations (fast-math) #21690

mpdn commented Jan 27, 2015

kmcallister commented Jan 28, 2015

mpdn commented Jan 28, 2015

bluss commented Aug 17, 2015

kornelski commented Jun 8, 2017 •

edited

Loading

bluss commented Jun 8, 2017

pedrocr commented Jun 8, 2017 •

edited

Loading

kornelski commented Jun 8, 2017 •

edited

Loading

pedrocr commented Jun 8, 2017 •

edited

Loading

pedrocr commented Jun 8, 2017 •

edited

Loading

kornelski commented Jun 8, 2017

pedrocr commented Jun 9, 2017

pedrocr commented Aug 10, 2017

robsmith11 commented Jan 20, 2019

jeffvandyke commented Oct 24, 2019 •

edited

Loading

pedrocr commented Oct 24, 2019

StHagel commented Jun 14, 2024

NobodyXu commented Jun 14, 2024

RalfJung commented Jun 14, 2024 •

edited

Loading

StHagel commented Jun 14, 2024

usamoi commented Jun 14, 2024

calder commented Feb 2, 2025 •

edited

Loading

StHagel commented Feb 3, 2025

calder commented Feb 4, 2025 •

edited

Loading

RReverser commented Feb 17, 2025

jgarvin commented Feb 17, 2025

RalfJung commented Feb 17, 2025 •

edited

Loading

RReverser commented Feb 17, 2025

kornelski commented Feb 17, 2025

RReverser commented Feb 17, 2025

hanna-kruppe commented Feb 17, 2025

RalfJung commented Feb 17, 2025 via email

tgross35 commented Feb 17, 2025

jgarvin commented Feb 17, 2025

RalfJung commented Feb 18, 2025 •

edited

Loading

Imprecise floating point operations (fast-math) #21690

Imprecise floating point operations (fast-math) #21690

Comments

mpdn commented Jan 27, 2015

kmcallister commented Jan 28, 2015

mpdn commented Jan 28, 2015

bluss commented Aug 17, 2015

kornelski commented Jun 8, 2017 • edited Loading

bluss commented Jun 8, 2017

pedrocr commented Jun 8, 2017 • edited Loading

kornelski commented Jun 8, 2017 • edited Loading

pedrocr commented Jun 8, 2017 • edited Loading

pedrocr commented Jun 8, 2017 • edited Loading

kornelski commented Jun 8, 2017

pedrocr commented Jun 9, 2017

pedrocr commented Aug 10, 2017

robsmith11 commented Jan 20, 2019

jeffvandyke commented Oct 24, 2019 • edited Loading

pedrocr commented Oct 24, 2019

StHagel commented Jun 14, 2024

NobodyXu commented Jun 14, 2024

RalfJung commented Jun 14, 2024 • edited Loading

StHagel commented Jun 14, 2024

usamoi commented Jun 14, 2024

calder commented Feb 2, 2025 • edited Loading

StHagel commented Feb 3, 2025

calder commented Feb 4, 2025 • edited Loading

RReverser commented Feb 17, 2025

jgarvin commented Feb 17, 2025

RalfJung commented Feb 17, 2025 • edited Loading

RReverser commented Feb 17, 2025

kornelski commented Feb 17, 2025

RReverser commented Feb 17, 2025

hanna-kruppe commented Feb 17, 2025

RalfJung commented Feb 17, 2025 via email

tgross35 commented Feb 17, 2025

jgarvin commented Feb 17, 2025

RalfJung commented Feb 18, 2025 • edited Loading

kornelski commented Jun 8, 2017 •

edited

Loading

pedrocr commented Jun 8, 2017 •

edited

Loading

kornelski commented Jun 8, 2017 •

edited

Loading

pedrocr commented Jun 8, 2017 •

edited

Loading

pedrocr commented Jun 8, 2017 •

edited

Loading

jeffvandyke commented Oct 24, 2019 •

edited

Loading

RalfJung commented Jun 14, 2024 •

edited

Loading

calder commented Feb 2, 2025 •

edited

Loading

calder commented Feb 4, 2025 •

edited

Loading

RalfJung commented Feb 17, 2025 •

edited

Loading

RalfJung commented Feb 18, 2025 •

edited

Loading