Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fast path of float parsing on x87 #33429

Merged
merged 4 commits into from
May 16, 2016
Merged

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented May 5, 2016

The fast path of the float parser relies on the rounding to happen
exactly and directly to the correct number of bits. On x87, instead,
double rounding would occour as the FPU stack defaults to 80 bits of
precision.

This can be fixed by setting the precision of the FPU stack before
performing the int to float conversion. This can be achieved by
changing the value of the x87 control word. This is a somewhat common
operation that is in fact performed whenever a float needs to be
truncated to an integer, but it is undesirable to add its overhead for
code that does not rely on x87 for computations (i.e. on non-x86
architectures, or x86 architectures which perform FPU computations on
using SSE).

Fixes num::dec2flt::fast_path_correct (on x87).

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@ranma42
Copy link
Contributor Author

ranma42 commented May 5, 2016

@rkruppe we might finally be able to fix float parsing on x87 :) #31632 (comment)
Do you know of any other issue regarding the use of x87 arithmetic in Rust?

I am not sure if the asm blocks should be volatile. I tested it building without SSE and found that num::dec2flt::fast_path_correct is fixed by this change, but there are some other failures in the test suite (see ranma42@0a9fbb0):

  • src/test/run-pass/sse2.rs fails as expected (it assumes that SSE2 is always available on x86)
  • src/libstd/num/f32.rs and src/test/run-pass/issue-21634.rs seem to fail because of the different rounding (even though the parsing is now fixed, most of the arithmetic is performed with 80 bits of precision, which can lead to different results)

@huonw
Copy link
Member

huonw commented May 5, 2016

Unfortunately, I suspect trying to control the x87 rounding stuff (indeed, any rounding stuff) is going to be an endless battle: AFAIK, LLVM doesn't pay attention to it at all (e.g. it'll happily constant fold in its own rounding mode). I'm all for incremental improvements, though!

@hanna-kruppe
Copy link
Contributor

Yay!

To be honest, I am a little bit uncomfortable with fiddling with the control word, especially if questions such as "should the asm block be volatile" are unclear. I would get over this quicker if there was data indicating a significant performance benefit vs. just skipping the fast path in favor of algorithm Bellerophon (which is perfectly correct and much simpler to implement). I have no idea how cheap or costly a control word change is.

RE: other failures: I believe most float tests are careful enough to permit slight rounding error. Perhaps the tests that are failing now should get the same treatment. Issue #21634 was about type inference, not float precision, so I assume this won't change the intent of the test.

As a matter of longer-term strategy, I think Rust needs to decide on a policy on floating point precision and IEEE 754 conformance. x87 is just one example, there's also varying quality of libm implementations, the slight IEEE 754 non-conformance of some co- and mobile processors, LLVM's optimizations, and probably more. I would really like a consensus about:

  • to what degree Rust should fight these problems, and
  • where it should draw the line and accept that the world is not perfect

@ranma42
Copy link
Contributor Author

ranma42 commented May 5, 2016

Regarding the worries about the FPU control word, it looks like LLVM is quite careful in restoring it to whatever value it had when it changes it.
I was unable to find any other place where LLVM fiddles with the control word besides lib/Target/X86/X86ISelLowering.cpp (searched using git grep -i fldcw), where the CW is changed for truncating float to integers and then immediately restored.

@rkruppe I think that the code should be marked volatile, because the CW needs to be changed before the conversion is done and restored after and the compiler should not be allowed to move the fldcw instructions around. I would love if somebody more knowledgeable than me about the Rust asm syntax could tell me if there is anything else that should be done (for example, I declared cw as non-mut and set it in from assembly because setting it mutable caused warnings about it not being modified).

Regarding IEEE 754 conformance, it would be good to also have a clear view about what LLVM guarantees: sometimes float operations are said to conform to IEEE (in some ML threads), but in most cases the specification does not require the IEEE behaviour.

@brson
Copy link
Contributor

brson commented May 5, 2016

cc @lifthrasiir. float parsing matters

@alexcrichton
Copy link
Member

Thanks @ranma42! Could you also add some comments to the code itself? As someone who basically has no idea what x87 is or how SSE affects this, it'd be nice to have an explanation as to what's going on here.

ranma42 added 2 commits May 13, 2016 15:18
The fast path of the float parser relies on the rounding to happen
exactly and directly to the correct number of bits. On x87, instead,
double rounding would occour as the FPU stack defaults to 80 bits of
precision.

This can be fixed by setting the precision of the FPU stack before
performing the int to float conversion. This can be achieved by
changing the value of the x87 control word. This is a somewhat common
operation that is in fact performed whenever a float needs to be
truncated to an integer, but it is undesirable to add its overhead for
code that does not rely on x87 for computations (i.e. on non-x86
architectures, or x86 architectures which perform FPU computations on
using SSE).

Fixes `num::dec2flt::fast_path_correct` (on x87).
Explain the meaning of the fields of the control word and provide more
details about how the relevant one (Precision Control) is updated in
the fast path.
@ranma42 ranma42 force-pushed the fix-x87-parsing branch from ced40bf to f96864d Compare May 13, 2016 13:21
@ranma42
Copy link
Contributor Author

ranma42 commented May 13, 2016

I added a (possibly too detailed?) description of the x87 FPU control word and around the save/update part of the code.
I also tried to rephrase the explanation of why it sometimes (but not always) used and what will cause problems (double rounding).

@@ -32,19 +32,101 @@ fn power_of_ten(e: i16) -> Fp {
Fp { f: sig, e: exp }
}

// Most architectures floating point operations with explicit bit size, therefore the precision of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems to be missing a word.

ranma42 added 2 commits May 16, 2016 15:37
Remove irrelevant information (and instead provide pointer to
reference documentation), replace ASCII-art table with the
corresponding MarkDown one, and minor fixes.
The `volatile` modifier was incorrectly written outside of the `asm!`
blocks.
@ranma42
Copy link
Contributor Author

ranma42 commented May 16, 2016

I tried to improve the documentation as suggested by @rkruppe (with a minor difference: instead of a code block I used markdown for the table) and fixed the wrong positioning of the parentheses of the asm! blocks.

@hanna-kruppe
Copy link
Contributor

Nice 👍

@alexcrichton
Copy link
Member

@bors: r+ 4ec1f8d

Thanks @ranma42!

@bors
Copy link
Contributor

bors commented May 16, 2016

⌛ Testing commit 4ec1f8d with merge 8310de8...

bors added a commit that referenced this pull request May 16, 2016
Fix fast path of float parsing on x87

The fast path of the float parser relies on the rounding to happen
exactly and directly to the correct number of bits. On x87, instead,
double rounding would occour as the FPU stack defaults to 80 bits of
precision.

This can be fixed by setting the precision of the FPU stack before
performing the int to float conversion. This can be achieved by
changing the value of the x87 control word. This is a somewhat common
operation that is in fact performed whenever a float needs to be
truncated to an integer, but it is undesirable to add its overhead for
code that does not rely on x87 for computations (i.e. on non-x86
architectures, or x86 architectures which perform FPU computations on
using SSE).

Fixes `num::dec2flt::fast_path_correct` (on x87).
@bors bors merged commit 4ec1f8d into rust-lang:master May 16, 2016
@ranma42 ranma42 deleted the fix-x87-parsing branch May 19, 2016 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants