Slightly improve accuracy of stod in to_floats #10622

davidwendt · 2022-04-08T14:55:14Z

Reference #10599

Provides a slight improvement in accuracy for the internal stod device function used by the cudf::strings::to_floats() API.

Reduces the number of floating-point operations by 1 and also applies the exponent by conditionally multiplying or dividing depending on it being positive or negative. This slightly improves accuracy of the result since multiplying decimal fractions in floating point can compound errors.

>>> s = cudf.Series(['1.0','2.0','0.1','0.2','0.3'])
>>> x = cudf.to_numeric(s)
>>> x[0]
1.0        previously 0.9999999999999999
>>> x[1]
2.0        previously 1.9999999999999998
>>> x[2]
0.1        previously 0.09999999999999999
>>> x[3]
0.2        previously 0.19999999999999998
>>> x[4]
0.3        same

The 1.0 floating-point value in bits was 3FEFFFFFFFFFFFFF and now computes to 3FF0000000000000 which is 1.0.
The 0.1 floating-point value in bits was 3FB9999999999999 and now computes to 3FB999999999999A which is now 0.10000000000000001 so the error is the same as 0.09999999999999999 but both are within expected epsilon.

Since the overall error is within std::numerics<T>::epsilon() error threshold, no tests had to be modified.

codecov · 2022-04-08T16:10:17Z

Codecov Report

Merging #10622 (18a565b) into branch-22.06 (956c7b5) will increase coverage by 0.03%.
The diff coverage is 88.97%.

@@               Coverage Diff                @@
##           branch-22.06   #10622      +/-   ##
================================================
+ Coverage         86.30%   86.34%   +0.03%     
================================================
  Files               140      140              
  Lines             22255    22280      +25     
================================================
+ Hits              19207    19237      +30     
+ Misses             3048     3043       -5

Impacted Files	Coverage Δ
python/cudf/cudf/core/frame.py	`94.75% <ø> (+1.02%)`	⬆️
python/dask_cudf/dask_cudf/tests/test_accessor.py	`98.41% <ø> (ø)`
python/cudf/cudf/core/indexed_frame.py	`91.77% <87.93%> (-0.87%)`	⬇️
python/cudf/cudf/core/column/lists.py	`90.62% <100.00%> (+0.57%)`	⬆️
python/cudf/cudf/core/dataframe.py	`93.59% <100.00%> (ø)`
python/cudf/cudf/core/series.py	`95.28% <100.00%> (-0.01%)`	⬇️
python/cudf/cudf/core/column/column.py	`89.45% <0.00%> (+0.10%)`	⬆️
python/cudf/cudf/core/column/string.py	`89.10% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.72% <0.00%> (+0.22%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb03c8b...18a565b. Read the comment docs.

vyasr

Nice find, LGTM. I assume unifying the different methods in #10599 is pending a longer discussion, perhaps about porting to libcu++ eventually?

davidwendt · 2022-04-08T18:12:05Z

Nice find, LGTM. I assume unifying the different methods in #10599 is pending a longer discussion, perhaps about porting to libcu++ eventually?

Yes. The current discussion is about having cuIO try to reuse this stod function. The original complaint was the results were different and so Spark changed their code from using to_floats() back to using cuIO instead. I thought this change would possibly allow them to move back to using to_floats() until #10599 is resolved. @andygrove

karthikeyann

LGTM 👍

davidwendt · 2022-04-11T19:45:05Z

@gpucibot merge

Fixes a rounding error on extremely small floating-point numbers in the range `1E-287 - 1E-307`. These values were incorrectly being rounded to zero due to the fix in #10622. The extra float operation removed in #10622 is necessary for values in this range to keep them from being converted to zero. The fix adds a check so the extra floating point operation is only used when the overall exponent falls below `std::numeric_limits<double>::min_exponent10` (which is `-307`). The `ToFloat64` gtest was also updated to include value in this range to ensure this error does not occur again. Additionally, the conversion now supports subnormal numbers that are very very small in the range of E-307 and E-324. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - Mike Wilson (https://github.com/hyperbolic2346) URL: #10672

Slightly improve accuracy of stod in to_floats

18a565b

davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 8, 2022

davidwendt requested a review from a team as a code owner April 8, 2022 14:55

davidwendt self-assigned this Apr 8, 2022

davidwendt requested review from robertmaynard and karthikeyann April 8, 2022 14:55

vyasr approved these changes Apr 8, 2022

View reviewed changes

karthikeyann approved these changes Apr 11, 2022

View reviewed changes

rapids-bot bot merged commit 012af64 into rapidsai:branch-22.06 Apr 11, 2022

davidwendt deleted the stod-accuracy-improvement branch April 11, 2022 19:45

davidwendt mentioned this pull request Apr 15, 2022

Fix rounding to zero error in stod on very small float numbers #10672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly improve accuracy of stod in to_floats #10622

Slightly improve accuracy of stod in to_floats #10622

davidwendt commented Apr 8, 2022 •

edited

Loading

codecov bot commented Apr 8, 2022 •

edited

Loading

vyasr left a comment

davidwendt commented Apr 8, 2022 •

edited

Loading

karthikeyann left a comment

davidwendt commented Apr 11, 2022

Slightly improve accuracy of stod in to_floats #10622

Slightly improve accuracy of stod in to_floats #10622

Conversation

davidwendt commented Apr 8, 2022 • edited Loading

codecov bot commented Apr 8, 2022 • edited Loading

Codecov Report

vyasr left a comment

Choose a reason for hiding this comment

davidwendt commented Apr 8, 2022 • edited Loading

karthikeyann left a comment

Choose a reason for hiding this comment

davidwendt commented Apr 11, 2022

davidwendt commented Apr 8, 2022 •

edited

Loading

codecov bot commented Apr 8, 2022 •

edited

Loading

davidwendt commented Apr 8, 2022 •

edited

Loading