-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imprecise floating point operations (fast-math) #21690
Comments
Inline IR was discussed on #15180. Another option is |
Yeah, adding it as a function in There are a few different fast math flags, but the |
This forum thread has examples of loops that llvm can vectorize well for integers, but doesn't for floats (a dot product). |
I've prototyped it using a newtype: https://gitlab.com/kornelski/ffast-math (https://play.rust-lang.org/?gist=d516771d1d002f740cc9bf6eb5cacdf0&version=nightly&backtrace=0) It works in simple cases, but the newtype solution is insufficient:
So I'm very keen on seeing it supported natively in Rust. |
I've tried -ffast-math in my C vs Rust benchmark of some graphics code: https://github.com/pedrocr/rustc-math-bench In the C code it's a ~20% improvement in clang but no benefit with GCC. In both cases it returns a wrong result and the math is extremely simple (multiplying a vector by a matrix). According to this:
|
@pedrocr Your benchmark has a loss of precision in With You get significantly different sum with All values from matrix multiplication are the same to at least 6 digits (I've diffed |
@pornel thanks, fixed here: pedrocr/rustc-math-bench@8169fa3 The benchmark results are fine though, the sum is only used as a checksum. Here are the averages of three runs in ms/megapixel:
So as I mentioned before clang/llvm gets a good benefit from ffast-math but not gcc. I'd say making sure things like |
I've suggested it would make sense to expose https://internals.rust-lang.org/t/pre-rfc-stabilization-of-target-feature/5176/23 |
Rust has fast math intrinsics, so the fast math behavior could be limited to a specific type or selected functions, without forcing the whole program into it. |
A usable solution for my use cases would probably be to have the vector types in the simd crate be the types that allow the opt-in to ffast-math. That way there's only one type I need to conciously convert the code to for speedups. But for the general solution of in normal code having to swap types seems cumbersome. But maybe just doing |
Created a pre-RFC discussion on internals to try and get a discussion on the best way to do this: https://internals.rust-lang.org/t/pre-rfc-whats-the-best-way-to-implement-ffast-math/5740 |
Is there a current recommended approach to using fast-math optimizations in rust nightly? |
If it helps, a good benchmark comparison article between C++ and Rust floating point optimizations (link) inside loops was written recently (Oct 19), with a good Hacker News discussion exploring this concept. Personally, I think the key is that without specifying any (EDIT: floating-point specific) flags (and after using iterators), by default clang and gcc do more optimizations on float math than Rust currently does. (EDIT: It seems that An important key for any discussion on opmitized float math should keep this in mind: vectorization isn't always less precise - a commenter pointed out that a vectorized floating point sum is actually more accurate than the un-vectorized version. Also see Stack Overflow: https://stackoverflow.com/a/7455442 I'm curious what criteria for vectorization clang (or gcc) uses for figuring out floating point optimization. I'm not enough of an expert in these areas to know specifics though. I'm also not sure what precision guarantees Rust makes for floating point math. |
That's not the case in the article. The clang compilation was using |
While I do think that adding an option to enable fast-math in Rust is definitely desireable, I don't like the idea of making a new type for it however. I would rather make it an optional compiler flag, that is not set by default in |
Putting it in profile, allowing each crate to set it, and allowing the binary crate to override it per-crate seems to make sense. You could then enable it for your library crate if you know it is safe , and for binary crate they can disable it if it turns out doesn't work, or enable it if they know what they are doing. |
Making it a compile flag that applies to other crates sounds like a terrible idea. When you download a crate from the internet, you can't know whether it was written in a way that is compatible with fast-math semantics. It is very important not to apply fast-math semantics to code that assumes IEEE semantics. We could have a crate-level attribute meaning "all floating-point ops in this crate are fast-math", but under no circumstances should you be able to force fast-math on other people's code. That would ultimately even undermine Rust's safety promise. |
That sounds like a good path to go on in my eyes. Being able to set an attribute in the Cargo.toml which basically means "This crate is fast-math-safe". Compiling your code with fast-math on would then check every dependency whether it is fast-math-safe or not and compile it accordingly. |
I use |
What's preventing us from stabilizing Here's a simple example where stable Rust is 8x slower than C++ (dot product of two 100,000 element
https://github.com/calder/dot-bench EDIT: Confirmed this would work great and close the performance gap with nightly: calder@c3c7fab |
That looks awesome, thanks for putting in the work! |
Only |
It sounds like an even better fit could be a custom The syntax seems like a very good fit though: #[target_feature(enable = "fast_math")]
unsafe fn i_promise_its_ok_to_reorder_math_in_me() { ... } |
Most float code doesn't assume or even consider IEEE semantics though. Most users of floating point don't know anything about the rounding guarantees or numerical stability or how the encoding works. Unfortunately likely a lot of code that works just fine with
I think it would be consistent with the rest of Rust if you were allowed to but it required using |
No, a target feature is definitely wrong as that can be set via A blanket global explicitly unsafe flag is also not going to fly; what would the safety comment for that even look like? "I audited all the code in my entire dependency tree"? The point of IMO the same goes even for the "just produces wrong results" variant (not sure if that has a standard name, it corresponds to the operations described in rust-lang/libs-team#532). We don't just go and alter the semantics of other people's code. In particular, this would be a breaking change as we'd have to take back what we say in https://doc.rust-lang.org/nightly/std/primitive.f32.html. So, I would recommend the discussion to focus on ways to opt-in to these semantics locally, in a way controlled by the author of the respective code. Everything else is either an outright no-go (if it can break soundness) or at least highly unlikely to lead anywhere (if it breaks stable semantic promises). |
Ah, true enough, forgot about the global switch. To me, the primary appeal of this syntax is the locality - ability to turn it on on per-function basis, and that sounds like something we agree on. I'm equally happy if it was a different attribute that can still be applied on per-function granularity. |
Per-function attributes have a hard to resolve issue of how "deep" they apply. If the attributes applied to all code called or inlined in the function body, then they could affect code of other crates, which is a no-no. If the attributes applied only in a shallow way, not through method calls, then it could be frustrating that methods on f32 won't be affected (since they're in core), and there would be visible difference between Previously I've proposed having a fast float type built-in into Rust, like Therefore, stabilization of fast-math intrinsics seems like the most realistic path forward: #21690 (comment) |
I'm not sure I agree with that. I agree with the sentiment above, that global toggle affecting arbitrary crates is a no-no, but I'd say anything invoked from a function marked with such attribute is a fair game - same as, when you mark a function with target_feature, the autovectorizer can roam free over both direct function body and any code indirectly inlined in it. (this is also the reason why, like with target_feature, I think this should require If anything, not being able to leverage same optimisations for std and existing third-party APIs that work on floats, despite an explicit attribute on my function, would make this near-useless except for very niche microoptimisations. At that point, if I have to do those microoptimisations manually anyway, I might as well reorder code and use 3rd party crates myself instead of using new intrinsics. It's specifically the "automatic optimisation" that makes fast-math so valuable. |
Target feature is only unsafe because you have to make sure the CPU you’re running on will recognize the instructions that the compiler may emit in that function. It doesn’t otherwise affect the semantics of the code and especially not of any code that happens to get inlined into it. If it did, we couldn’t have it apply to inlined code because it would face the same problem as fast math flags: the caller is generally not able to judge whether the callee could handle the change in semantics gracefully or not. Yes, this makes it very hard to use any third party code in loops that you want to see auto-vectorized. The proper solution is to get the authors of the third party loop to opt into it (this is already possible today), or to declare that they’re fine with either semantics (needs language design). |
The autovectorizer does not change program semantics. So these situations are not comparable. Fast-math is not an "optimization", it is askinh for different program behavior.
Exposing (a version of) the intrinsics is tracked at <#136469>. Thanks for taking the initiative on that! It is not clear to me if this long-winded thread still serves any purpose. There is more design space to explore here, but keeping open an issue with 100 comments (many outdated) is not useful for that purpose. Is there anything reasonably concrete / actionable we still want to track here?
|
Agreeing with Ralf, I am going to close this. The reassociation part of fast math is unstably available at #136469. If anybody still has a need for FTZ/DAZ variants or unsafe variants that poison NaN/inf, feel free to open a fresh issue or propose directly as an ACP. |
Rust compromises its memory safety all the time when you use an
No, you would attach it to specific dependencies. It's a similar procedure to any time you have optional unsafe code. When you suspect unsafety is causing a problem you turn off the unsafe flag everywhere and if that fixes the issue you begin binary searching your dependencies.
There is no promise being broken unless a library author has specifically advertised their library as being safe to use with it (and some probably would). |
Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
No, this is not even remotely similar to anything we have currently. Nothing in Rust currently lets you unilaterally alter the behavior of code in your dependencies. Even if you did audit that entire crate, it's totally permitted under semver for a minor version bump of the crate to start using floating-point operations in a new way, voiding your audit. Rust is generally carefully designed so that the only thing you ever can or have to know about another crate is its public API surface and the associated documentation. This would completely undermine that. Everything about this flag fundamentally clashes with the idea of robust, compositional system design. It was a terrible mistake to ever add it to C compilers, and Rust should not repeat that mistake. Rust should instead explore alternative, less fragile ways of exposing those semantics. (That's just my personal opinion, I am not speaking for any team here. But I consider it quite likely that many t-lang and t-opsem people will agree with this sentiment.) A first step has been done and is tracked in #136469. Maybe a second step is a per-function / per-module / per-crate attribute, though that will have to explain convincingly which issue is solved by this that the algebraic operations do not solve yet. A sort of pre-RFC on IRLO would likely be a good starting point here; please do not try to design something from scratch in this issue. Also please stop rehashing points that have already been made many times above. I get that not everyone agrees with the Rust design philosophy, but it's not going to change without demonstrating that all conceivable alternatives have been explored and they are all clearly inferior. |
Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps try-job: x86_64-gnu-nopt
Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps try-job: x86_64-gnu-nopt
Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps try-job: x86_64-gnu-nopt
Expose algebraic floating point intrinsics # Problem A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization. See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H. ### C++: 10us ✅ With Clang 18.1.3 and `-O2 -march=haswell`: <table> <tr> <th>C++</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="cc"> float dot(float *a, float *b, size_t len) { #pragma clang fp reassociate(on) float sum = 0.0; for (size_t i = 0; i < len; ++i) { sum += a[i] * b[i]; } return sum; } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" /> </td> </tr> </table> ### Nightly Rust: 10us ✅ With rustc 1.86.0-nightly (8239a37) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i])); } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" /> </td> </tr> </table> ### Stable Rust: 84us ❌ With rustc 1.84.1 (e71f9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`: <table> <tr> <th>Rust</th> <th>Assembly</th> </tr> <tr> <td> <pre lang="rust"> fn dot(a: &[f32], b: &[f32]) -> f32 { let mut sum = 0.0; for i in 0..a.len() { sum += a[i] * b[i]; } sum } </pre> </td> <td> <img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" /> </td> </tr> </table> # Proposed Change Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature. # Alternatives Considered rust-lang#21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles. In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit. # References * rust-lang#21690 * rust-lang/libs-team#532 * rust-lang#136469 * https://github.com/calder/dot-bench * https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps ~~try-job: x86_64-gnu-nopt~~ try-job: x86_64-gnu-aux
There should be a way to use imprecise floating point operations like GCC's and Clang's
-ffast-math
. The simplest way to do this would be to do like GCC and Clang and implement a command line flag, but I think a better way to do this would be to create af32fast
andf64fast
type that would then call the fast LLVM math functions. This way you can easily mix fast and "slow" floating point operations.I think this could be implemented as a library if LLVM assembly could be used in the
asm
macro.The text was updated successfully, but these errors were encountered: