Fix fast path of float parsing on x87 #33429

ranma42 · 2016-05-05T11:49:34Z

The fast path of the float parser relies on the rounding to happen
exactly and directly to the correct number of bits. On x87, instead,
double rounding would occour as the FPU stack defaults to 80 bits of
precision.

This can be fixed by setting the precision of the FPU stack before
performing the int to float conversion. This can be achieved by
changing the value of the x87 control word. This is a somewhat common
operation that is in fact performed whenever a float needs to be
truncated to an integer, but it is undesirable to add its overhead for
code that does not rely on x87 for computations (i.e. on non-x86
architectures, or x86 architectures which perform FPU computations on
using SSE).

Fixes num::dec2flt::fast_path_correct (on x87).

rust-highfive · 2016-05-05T11:49:46Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

ranma42 · 2016-05-05T11:58:05Z

@rkruppe we might finally be able to fix float parsing on x87 :) #31632 (comment)
Do you know of any other issue regarding the use of x87 arithmetic in Rust?

I am not sure if the asm blocks should be volatile. I tested it building without SSE and found that num::dec2flt::fast_path_correct is fixed by this change, but there are some other failures in the test suite (see ranma42@0a9fbb0):

src/test/run-pass/sse2.rs fails as expected (it assumes that SSE2 is always available on x86)
src/libstd/num/f32.rs and src/test/run-pass/issue-21634.rs seem to fail because of the different rounding (even though the parsing is now fixed, most of the arithmetic is performed with 80 bits of precision, which can lead to different results)

huonw · 2016-05-05T12:17:03Z

Unfortunately, I suspect trying to control the x87 rounding stuff (indeed, any rounding stuff) is going to be an endless battle: AFAIK, LLVM doesn't pay attention to it at all (e.g. it'll happily constant fold in its own rounding mode). I'm all for incremental improvements, though!

hanna-kruppe · 2016-05-05T12:25:21Z

Yay!

To be honest, I am a little bit uncomfortable with fiddling with the control word, especially if questions such as "should the asm block be volatile" are unclear. I would get over this quicker if there was data indicating a significant performance benefit vs. just skipping the fast path in favor of algorithm Bellerophon (which is perfectly correct and much simpler to implement). I have no idea how cheap or costly a control word change is.

RE: other failures: I believe most float tests are careful enough to permit slight rounding error. Perhaps the tests that are failing now should get the same treatment. Issue #21634 was about type inference, not float precision, so I assume this won't change the intent of the test.

As a matter of longer-term strategy, I think Rust needs to decide on a policy on floating point precision and IEEE 754 conformance. x87 is just one example, there's also varying quality of libm implementations, the slight IEEE 754 non-conformance of some co- and mobile processors, LLVM's optimizations, and probably more. I would really like a consensus about:

to what degree Rust should fight these problems, and
where it should draw the line and accept that the world is not perfect

ranma42 · 2016-05-05T13:41:55Z

Regarding the worries about the FPU control word, it looks like LLVM is quite careful in restoring it to whatever value it had when it changes it.
I was unable to find any other place where LLVM fiddles with the control word besides lib/Target/X86/X86ISelLowering.cpp (searched using git grep -i fldcw), where the CW is changed for truncating float to integers and then immediately restored.

@rkruppe I think that the code should be marked volatile, because the CW needs to be changed before the conversion is done and restored after and the compiler should not be allowed to move the fldcw instructions around. I would love if somebody more knowledgeable than me about the Rust asm syntax could tell me if there is anything else that should be done (for example, I declared cw as non-mut and set it in from assembly because setting it mutable caused warnings about it not being modified).

Regarding IEEE 754 conformance, it would be good to also have a clear view about what LLVM guarantees: sometimes float operations are said to conform to IEEE (in some ML threads), but in most cases the specification does not require the IEEE behaviour.

brson · 2016-05-05T18:33:32Z

cc @lifthrasiir. float parsing matters

alexcrichton · 2016-05-05T22:00:06Z

Thanks @ranma42! Could you also add some comments to the code itself? As someone who basically has no idea what x87 is or how SSE affects this, it'd be nice to have an explanation as to what's going on here.

The fast path of the float parser relies on the rounding to happen exactly and directly to the correct number of bits. On x87, instead, double rounding would occour as the FPU stack defaults to 80 bits of precision. This can be fixed by setting the precision of the FPU stack before performing the int to float conversion. This can be achieved by changing the value of the x87 control word. This is a somewhat common operation that is in fact performed whenever a float needs to be truncated to an integer, but it is undesirable to add its overhead for code that does not rely on x87 for computations (i.e. on non-x86 architectures, or x86 architectures which perform FPU computations on using SSE). Fixes `num::dec2flt::fast_path_correct` (on x87).

Explain the meaning of the fields of the control word and provide more details about how the relevant one (Precision Control) is updated in the fast path.

ranma42 · 2016-05-13T13:25:43Z

I added a (possibly too detailed?) description of the x87 FPU control word and around the save/update part of the code.
I also tried to rephrase the explanation of why it sometimes (but not always) used and what will cause problems (double rounding).

hanna-kruppe · 2016-05-13T13:35:16Z

src/libcore/num/dec2flt/algorithm.rs

@@ -32,19 +32,101 @@ fn power_of_ten(e: i16) -> Fp {
    Fp { f: sig, e: exp }
 }

+// Most architectures floating point operations with explicit bit size, therefore the precision of


This sentence seems to be missing a word.

Remove irrelevant information (and instead provide pointer to reference documentation), replace ASCII-art table with the corresponding MarkDown one, and minor fixes.

The `volatile` modifier was incorrectly written outside of the `asm!` blocks.

ranma42 · 2016-05-16T13:42:59Z

I tried to improve the documentation as suggested by @rkruppe (with a minor difference: instead of a code block I used markdown for the table) and fixed the wrong positioning of the parentheses of the asm! blocks.

hanna-kruppe · 2016-05-16T13:54:54Z

Nice 👍

alexcrichton · 2016-05-16T16:34:58Z

@bors: r+ 4ec1f8d

Thanks @ranma42!

bors · 2016-05-16T16:46:09Z

⌛ Testing commit 4ec1f8d with merge 8310de8...

Fix fast path of float parsing on x87 The fast path of the float parser relies on the rounding to happen exactly and directly to the correct number of bits. On x87, instead, double rounding would occour as the FPU stack defaults to 80 bits of precision. This can be fixed by setting the precision of the FPU stack before performing the int to float conversion. This can be achieved by changing the value of the x87 control word. This is a somewhat common operation that is in fact performed whenever a float needs to be truncated to an integer, but it is undesirable to add its overhead for code that does not rely on x87 for computations (i.e. on non-x86 architectures, or x86 architectures which perform FPU computations on using SSE). Fixes `num::dec2flt::fast_path_correct` (on x87).

bors · 2016-05-16T19:15:07Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-32cross-opt, auto-linux-64-cargotest, auto-linux-64-cross-armhf, auto-linux-64-cross-armsf, auto-linux-64-cross-freebsd, auto-linux-64-cross-netbsd, auto-linux-64-debug-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-opt-mir, auto-linux-64-opt-rustbuild, auto-linux-64-x-android-t, auto-linux-cross-opt, auto-linux-musl-64-opt, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-mac-64-opt-rustbuild, auto-mac-cross-ios-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-32-opt-rustbuild, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-cross-opt, auto-win-msvc-32-opt, auto-win-msvc-64-cargotest, auto-win-msvc-64-opt, auto-win-msvc-64-opt-mir, auto-win-msvc-64-opt-rustbuild

rust-highfive assigned alexcrichton May 5, 2016

ranma42 added 2 commits May 13, 2016 15:18

Document the x87 control word

f96864d

Explain the meaning of the fields of the control word and provide more details about how the relevant one (Precision Control) is updated in the fast path.

ranma42 force-pushed the fix-x87-parsing branch from ced40bf to f96864d Compare May 13, 2016 13:21

hanna-kruppe reviewed May 13, 2016
View reviewed changes

ranma42 added 2 commits May 16, 2016 15:37

Cleanup documentation

88afeb9

Remove irrelevant information (and instead provide pointer to reference documentation), replace ASCII-art table with the corresponding MarkDown one, and minor fixes.

Fix asm! blocks

4ec1f8d

The `volatile` modifier was incorrectly written outside of the `asm!` blocks.

bors merged commit 4ec1f8d into rust-lang:master May 16, 2016

ranma42 deleted the fix-x87-parsing branch May 19, 2016 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fast path of float parsing on x87 #33429

Fix fast path of float parsing on x87 #33429

ranma42 commented May 5, 2016

rust-highfive commented May 5, 2016

ranma42 commented May 5, 2016

huonw commented May 5, 2016

hanna-kruppe commented May 5, 2016

ranma42 commented May 5, 2016 •

edited

Loading

brson commented May 5, 2016

alexcrichton commented May 5, 2016

ranma42 commented May 13, 2016

hanna-kruppe May 13, 2016

ranma42 commented May 16, 2016

hanna-kruppe commented May 16, 2016

alexcrichton commented May 16, 2016

bors commented May 16, 2016

bors commented May 16, 2016

Fix fast path of float parsing on x87 #33429

Fix fast path of float parsing on x87 #33429

Conversation

ranma42 commented May 5, 2016

rust-highfive commented May 5, 2016

ranma42 commented May 5, 2016

huonw commented May 5, 2016

hanna-kruppe commented May 5, 2016

ranma42 commented May 5, 2016 • edited Loading

brson commented May 5, 2016

alexcrichton commented May 5, 2016

ranma42 commented May 13, 2016

hanna-kruppe May 13, 2016

Choose a reason for hiding this comment

ranma42 commented May 16, 2016

hanna-kruppe commented May 16, 2016

alexcrichton commented May 16, 2016

bors commented May 16, 2016

bors commented May 16, 2016

ranma42 commented May 5, 2016 •

edited

Loading