-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fast path of float parsing on x87 #33429
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@rkruppe we might finally be able to fix float parsing on x87 :) #31632 (comment) I am not sure if the
|
Unfortunately, I suspect trying to control the x87 rounding stuff (indeed, any rounding stuff) is going to be an endless battle: AFAIK, LLVM doesn't pay attention to it at all (e.g. it'll happily constant fold in its own rounding mode). I'm all for incremental improvements, though! |
Yay! To be honest, I am a little bit uncomfortable with fiddling with the control word, especially if questions such as "should the asm block be volatile" are unclear. I would get over this quicker if there was data indicating a significant performance benefit vs. just skipping the fast path in favor of algorithm Bellerophon (which is perfectly correct and much simpler to implement). I have no idea how cheap or costly a control word change is. RE: other failures: I believe most float tests are careful enough to permit slight rounding error. Perhaps the tests that are failing now should get the same treatment. Issue #21634 was about type inference, not float precision, so I assume this won't change the intent of the test. As a matter of longer-term strategy, I think Rust needs to decide on a policy on floating point precision and IEEE 754 conformance. x87 is just one example, there's also varying quality of libm implementations, the slight IEEE 754 non-conformance of some co- and mobile processors, LLVM's optimizations, and probably more. I would really like a consensus about:
|
Regarding the worries about the FPU control word, it looks like LLVM is quite careful in restoring it to whatever value it had when it changes it. @rkruppe I think that the code should be marked Regarding IEEE 754 conformance, it would be good to also have a clear view about what LLVM guarantees: sometimes float operations are said to conform to IEEE (in some ML threads), but in most cases the specification does not require the IEEE behaviour. |
cc @lifthrasiir. float parsing matters |
Thanks @ranma42! Could you also add some comments to the code itself? As someone who basically has no idea what x87 is or how SSE affects this, it'd be nice to have an explanation as to what's going on here. |
The fast path of the float parser relies on the rounding to happen exactly and directly to the correct number of bits. On x87, instead, double rounding would occour as the FPU stack defaults to 80 bits of precision. This can be fixed by setting the precision of the FPU stack before performing the int to float conversion. This can be achieved by changing the value of the x87 control word. This is a somewhat common operation that is in fact performed whenever a float needs to be truncated to an integer, but it is undesirable to add its overhead for code that does not rely on x87 for computations (i.e. on non-x86 architectures, or x86 architectures which perform FPU computations on using SSE). Fixes `num::dec2flt::fast_path_correct` (on x87).
Explain the meaning of the fields of the control word and provide more details about how the relevant one (Precision Control) is updated in the fast path.
I added a (possibly too detailed?) description of the x87 FPU control word and around the save/update part of the code. |
@@ -32,19 +32,101 @@ fn power_of_ten(e: i16) -> Fp { | |||
Fp { f: sig, e: exp } | |||
} | |||
|
|||
// Most architectures floating point operations with explicit bit size, therefore the precision of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence seems to be missing a word.
Remove irrelevant information (and instead provide pointer to reference documentation), replace ASCII-art table with the corresponding MarkDown one, and minor fixes.
The `volatile` modifier was incorrectly written outside of the `asm!` blocks.
I tried to improve the documentation as suggested by @rkruppe (with a minor difference: instead of a code block I used markdown for the table) and fixed the wrong positioning of the parentheses of the |
Nice 👍 |
Fix fast path of float parsing on x87 The fast path of the float parser relies on the rounding to happen exactly and directly to the correct number of bits. On x87, instead, double rounding would occour as the FPU stack defaults to 80 bits of precision. This can be fixed by setting the precision of the FPU stack before performing the int to float conversion. This can be achieved by changing the value of the x87 control word. This is a somewhat common operation that is in fact performed whenever a float needs to be truncated to an integer, but it is undesirable to add its overhead for code that does not rely on x87 for computations (i.e. on non-x86 architectures, or x86 architectures which perform FPU computations on using SSE). Fixes `num::dec2flt::fast_path_correct` (on x87).
The fast path of the float parser relies on the rounding to happen
exactly and directly to the correct number of bits. On x87, instead,
double rounding would occour as the FPU stack defaults to 80 bits of
precision.
This can be fixed by setting the precision of the FPU stack before
performing the int to float conversion. This can be achieved by
changing the value of the x87 control word. This is a somewhat common
operation that is in fact performed whenever a float needs to be
truncated to an integer, but it is undesirable to add its overhead for
code that does not rely on x87 for computations (i.e. on non-x86
architectures, or x86 architectures which perform FPU computations on
using SSE).
Fixes
num::dec2flt::fast_path_correct
(on x87).