-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Added Grisu3 algorithm support for double.ToString(). #14646
Conversation
- Implemented Grisu3 algorithm. - When calling double.ToString(), try Grisu3 first, if it fails, fall back to Dragon4. Fix #14478
Overview:
Following are the benchmark: Average runtime comparison:
Following are the details of each performance test result:Before change (With only Dragon4):
After change (Grisu3 + Dragon4):
Above results were generated by the code: https://github.com/dotnet/corefx/blob/master/src/System.Runtime/tests/Performance/Perf.Double.cs |
@tarekgh @tannergooding @jkotas PTAL. |
Do we know how much slower when we hit such cases? is there a way can detect such numbers early and call Dragon directly? I am trying to find a mitigation for the cases we are slower here. |
@dotnet-bot test Windows_NT x64 corefx_baseline |
Following are the test results failed in Grisu3. The minimum regression is around 4%, the maximum regression is around 22%. The regression depends on the requested precision. We can exit Grisu3 quicker when requested precision smaller (If we can exhaust all the requested digit count then we success in Grisu3).
AFAIK we do not have an existing algorithm to do this (except the check algorithm in Grisu3 itself, which I'm using in the implementation). I may try to find a way to jump to Dragon4 earlier but we may need to set a bottom line for this - what's the acceptable regression rate for those numbers fail in Grisu3? Note that even if we fall back to Dragon4, we still have a great performance improvement compare to current (2.0.0) implementation. |
BTW, the CI failure seems caused by some jit compile error, which should not be introduced by my code. |
I care about the common scenarios (e.g. Double.ToString() and Double.ToString("R")). from your data, we'll have around 14% regression with "E" and around 4% of "R". what is the regression with "G"? I think the regression with "R" is acceptable. if the regression with "G" is small too (around 5%) that will be acceptable. Also if the regression occurs with small set/ranges of numbers, that can mitigate the regression too. In general, I am seeing the perf gain with Grisu3 is encouraging and worth to have it.
right I looked early at the failures and I am not seeing it related to your changes |
@tarekgh Following are the test for the failing numbers. I tested two numbers 1 and 250:
The regression of different numbers are variety, basically depends one the digits of numbers and the requested precision. We need to have a tradeoff - is it worthwhile to make 99.5% numbers 90% quicker but sacrifice performance of the left 0.5% numbers (up to 24% regression). Meanwhile, I'm trying to mitigate the regression. Currently I don't have any good approach yet. Cache the failed numbers and precision in a limited memory space may be an easy way to go (e.g., a micro LRU cache system). Or we need to design a new algorithm for pre-checking the numbers specifically other than Grisu3's self-checking during producing digits. Or combine the above two approaches. |
if we are sure that the failing numbers are in the range of 0.5% then I think it is worth it.
I think we don't need to create that complication. if we don't have a simple way to check if the number will fail, then we shouldn't add any complexity and we may just accept the regression. |
@mazong1123, @tarekgh: I still think it would be useful to know where these failing numbers fall. Floating-point values that are normalized between -1.0 and +1.0 are fairly common in a number of different applications. If a large portion of the failing 0.5% (which is still 92.2 Quadrillion values) fall in that range, than it may actually be more common than the percentage would lead you to believe. |
@tannergooding Sure I'm going to collect the number range of 0.5% in the next step. The paper didn't tell us this information. I need to collect it and prove it by myself so it may take sometime. But I agree it's worth doing it. |
@tannergooding @tarekgh I've spent sometime to investigate what kind of numbers will fail in Grisu3 with fixed precision. And here is the brief result: There're 3 factors can impact the result:
The algorithm is first to produce digits according to We're easy to generate the numbers fail in Grisu3 - just make the precision large enough, and For example, let's say However, So you can imagine, in 17 digits precision. a large set of double numbers without factional parts will fail in Grisu3 (1, 2, ...100, 1000 will fail, 1.1, 1.2, 2.1 will success). Of cause, if the requested precision is small enough, those numbers can success in Grisu3 - For instance, Also, I think the 0.5% miss is probably for free format - generate for shortest length, not for fixed format, which is our case. I didn't do comparing test but you can think of it - I can give a very large requested count, and all the double values without fractional part will fail, |
I'm collecting the hit rate for all double numbers for 17 digits precision. Hope this can give us a brief idea. I'd like to collect the hit rate from 1 digits precision to 30 digits precision later on. |
I've calculated to 109500000000 by now and the hit rate is 99.27%. My machine does not have that power to calculate all double numbers I guess:
Wrap up:
For my opinion, it's worth having it. Not only for the 90% performance gain for 99.5% numbers, but also because it has been widely adopted in other frameworks/libraries (V8, Rust, Go). Additionally, Python using Errol3, which claims have improvement from Grisu3, but there're some issues of this algorithm so I still think Grisu3 is a better choice for now. And maybe next time we can upgrade it to Errol3 - well, that's another topic. @tarekgh @tannergooding thoughts? |
@mazong1123 does Grisu3 always succeed with the lower precisions? I mean, you mentioned before it always fails with 17 digit precisions with the numbers doesn't have fractions. Is it true for 15-digits precisions? What I am trying to get into is if we can have Grisu3 is on by default when lower precision is used and we can use Dragon for higher precision. I am inclining to have Grisu3 enabled but in same time just want to reduce the regression in the failed case. |
Depends on the number, it will fail in 15-digits precisions either. For example,
Unfortunately, it is not linear. For example, To estimate the number being success in Grisu3 is part of Errol3's job. Although Errol3 can always success, it is slower than Grisu3 at sometime. See the paper: https://cseweb.ucsd.edu/~mandrysc/pub/dtoa.pdf One possible optimization I think is to extract the error analysis from Errol3 and apply to Grisu3. Someone had the same idea: https://news.ycombinator.com/item?id=10922347 |
@mazong1123 thanks for the details, it is very helpful. Is it possible we detect the numbers that don't have fraction part and then use Dragon? do you think this will reduce the possibilities of having Grisu3 to fail when it is used? As you see, I am trying to reduce the possibility of regressions as much as possible. |
@tarekgh Yes I'm trying to do that. The best is to extract the error analysis (determine whether the number will fail in Grisu3 quickly) from Errol3 and apply to Grisu3. That should increase the success rate from 99.5% to 99.95% in theory. |
I finally have time to work on this issue. I found it's too complicated to extract the error analysis from Errorl and apply it to Grisu3. So I just made a shortcut in if (p2 == 0 && (count >= 12 || p1 < GetPowerOfTen(count - 1)))
{
return false;
} |
@tarekgh good news. The shortcut does mitigate the regression. Please see following results for Grisu3's failing cases.
The shortcut does little affect for other success cases so I think this optimization should be good to go. |
@mazong1123 I looked at the changes and it looks good to me. let's wait for @tannergooding review and then we can start consulting the legal again regarding the usage of Grisu algorithm. |
src/classlibnative/bcltype/diyfp.h
Outdated
// An exteneded floating-point data structure. | ||
// It defines a 64-bit significand and a 32-bit exponent, | ||
// which is EXTENDED compare to IEEE double precision floating-point number. | ||
class DiyFp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful if this was more significantly documented.
What is it used for, why it is needed, and the reasoning for the specific format (96-bits, 64-bit significand, 32-bit exponent).
Most of it is not immediately obvious and will be initially confusing to anyone else coming to touch the code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will do.
src/classlibnative/bcltype/diyfp.h
Outdated
static const int SIGNIFICAND_LENGTH = 64; | ||
|
||
private: | ||
UINT64 m_f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just using f
and e
as the names is confusing without additional context indicating that they are the significand and exponent.
It also isn't clear if this supports a sign and whether or not the exponent is biased.
{ | ||
if (((FPDOUBLE*)&value)->exp != 0) | ||
{ | ||
// For normalized value, according to https://en.wikipedia.org/wiki/Double-precision_floating-point_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing the sign handling and the case where the implicit significand bit is 0.
Edit: Nevermind, I see the case where the implicit significant bit is 0 is below.
#include <math.h> | ||
|
||
// 1/lg(10) | ||
const double Grisu3::D_1_LOG2_10 = 0.30102999566398114; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. 1 / log2(10)
is 0.301029995663981195...
, which (when rounded) is exactly representable in binary64 as 0.30102999566398120
// This implementation is based on the paper: http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf | ||
// You must read this paper to fully understand the code. | ||
// | ||
// Note: Instead of generating shortest digits, we generate the digits according to the input count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably list this as a deviation, rather than a note.
I did a quick walkthrough of the new code and the paper and this looks to be generally correct. I still want to dig into this more in depth, but we can probably start the other process while I do that. |
@dotnet-bot test Ubuntu x64 Checked Build and Test (Jit - CoreFx) |
Seems like I'm not able to restart the test |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline |
CC @joperezr |
I have asked @joperezr to start following up with the LCA so we can proceed with the changes. meanwhile, @tannergooding can finish his code review so we'll be ready. |
I have talked to LCA on our side and we are good to go to take this as long as @tannergooding approves the implementation, which looks good to me. |
It looks correct to me as well, having dug more in depth. I'd like to see the requested comments/documentation added to the source code and the minor error with the |
@mazong1123 could you please address the remaining comments on the code review so we can proceed to merge it? |
Great! Will address the code review comments within next week :) |
@mazong1123 did you have a chance to finish this one? |
@tarekgh Will do this weekend. Sorry for the late. Busy for onboarding the new team... |
Added more comments. Changed the value of D_1_LOG2_10
@tarekgh @tannergooding I just updated the code according to the review comments. Please take a look. Thanks! |
@tannergooding if you are ok with the latest changes, we can merge this one. |
@joperezr please follow up if there is any license docs need to be updated. Thanks @mazong1123 for getting this done. |
Fix #14478