Fix string to double conversion and row equivalent comparison #7410

ttnghia · 2021-02-18T17:54:06Z

This PR fixes #5225 and fixes #5731 along with several improvements. In particular:

Fixes the function stod in convert_floats.cu, which incorrectly uses max_mantissa, thus produces incorrect results as mentioned in the issue [BUG] discrepency in casting String to double #5225. Note that this PR still cannot produce perfectly matched results with std::atof, since handling float values near inf is very difficult.
Fixes the class corresponding_rows_not_equivalent which didn't handle inf and nan, thus it returned "equivalent" when comparing a valid float number with inf.
Adds a test case for those fixes.
Rewrite test cases for string=>float coversion to use std::atof results for comparison.

…ypes

…orrectly handles inf and nan.

…sue-5225 # Conflicts: # cpp/benchmarks/CMakeLists.txt

davidwendt

This is excellent.
Looks like you reformatted the CMakeLists.txt file.
You should change that back please.

cpp/tests/strings/floats_tests.cpp

cpp/tests/utilities/column_utilities.cu

kkraus14

cmake lgtm

…sue-5225 # Conflicts: # cpp/benchmarks/CMakeLists.txt

cpp/benchmarks/string/convert_floats_benchmark.cpp

…rk.cpp Co-authored-by: David <[email protected]>

davidwendt

Looks good. Thanks for doing this.

vuule

Couple suggestions, mostly related to data generation.

cpp/src/strings/convert/convert_floats.cu

cpp/benchmarks/string/convert_floats_benchmark.cpp

…generate columns of strings by calling to cudf::strings::to_floats

ttnghia · 2021-03-02T03:37:40Z

@gpucibot merge.

harrism · 2021-03-02T03:42:13Z

@ttnghia you can't merge if CI is not passing...

harrism · 2021-03-02T03:43:24Z

@gpucibot rerun tests

ttnghia · 2021-03-02T03:44:11Z

I see. Thanks Mark.
It seems that there are some issues with CI accounts so the server could not run.

# Conflicts: # cpp/benchmarks/CMakeLists.txt

ttnghia · 2021-03-02T21:44:39Z

@gpucibot merge.

kkraus14 · 2021-03-02T21:54:52Z

@gpucibot merge

@revans2

This is a part of #7410 and depends on it. The java tests when they were first written, for what ever reason, did not use the correct string representation for several values including double max. They ended up testing that the code was broken, which is not the right thing to do. This updates those to have the correct values, but also has a comment about why the ideal value to test for overflow into `Inf` and `-Inf` is not ideal, but is a work around for small issues with the parsing code still. Authors: - Robert (Bobby) Evans (@revans2) - Nghia Truong (@ttnghia) Approvers: - Jason Lowe (@jlowe) URL: #7473

ttnghia added 4 commits February 17, 2021 21:06

Add a benchmark for string <=> floats conversion.

7a93e07

Fix error and improve the function converting fromn string to float t…

3ea06fb

…ypes

Fix the equivalent check function for floating point numbers that inc…

4bfd906

…orrectly handles inf and nan.

Add a test for converting string to double number

0cfdbcd

ttnghia requested review from a team as code owners February 18, 2021 17:54

ttnghia requested review from karthikeyann and vuule February 18, 2021 17:54

github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Feb 18, 2021

Merge remote-tracking branch 'origin/branch-0.19' into branch-0.19-is…

7778f85

…sue-5225 # Conflicts: # cpp/benchmarks/CMakeLists.txt

github-actions bot added the CMake CMake build issue label Feb 18, 2021

ttnghia removed the CMake CMake build issue label Feb 18, 2021

ttnghia changed the title ~~Fix string to double conversion~~ Fix string to double conversion and row equivalent comparison Feb 18, 2021

Update copyright header

a5ede5d

github-actions bot added the CMake CMake build issue label Feb 18, 2021

davidwendt self-requested a review February 18, 2021 18:31

davidwendt requested changes Feb 18, 2021

View reviewed changes

cpp/tests/strings/floats_tests.cpp Show resolved Hide resolved

jrhemstad reviewed Feb 18, 2021

View reviewed changes

cpp/tests/utilities/column_utilities.cu Outdated Show resolved Hide resolved

Fix const qualifier position and change CMakeLists.txt

2584840

vuule removed the improvement Improvement / enhancement to an existing function label Feb 18, 2021

kkraus14 approved these changes Feb 18, 2021

View reviewed changes

revans2 mentioned this pull request Feb 22, 2021

Better float/double cases for casting tests NVIDIA/spark-rapids#1781

Merged

Merge remote-tracking branch 'origin/branch-0.19' into branch-0.19-is…

6b1af3b

…sue-5225 # Conflicts: # cpp/benchmarks/CMakeLists.txt

ttnghia added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Feb 23, 2021

davidwendt requested changes Feb 23, 2021

View reviewed changes

cpp/benchmarks/string/convert_floats_benchmark.cpp Outdated Show resolved Hide resolved

Update header format for cpp/benchmarks/string/convert_floats_benchma…

3a7c52e

…rk.cpp Co-authored-by: David <[email protected]>

davidwendt approved these changes Feb 23, 2021

View reviewed changes

vuule requested changes Feb 26, 2021

View reviewed changes

revans2 mentioned this pull request Mar 1, 2021

Fix java float/double parsing tests [skip ci] #7473

Merged

ttnghia added 2 commits March 1, 2021 11:09

Add a comment to the stod function

4fbd4f6

Generate random float numbers by calling to create_random_table, and …

0b8633b

…generate columns of strings by calling to cudf::strings::to_floats

vuule self-requested a review March 1, 2021 22:09

Fix format check

886fc57

vuule approved these changes Mar 1, 2021

View reviewed changes

Merge branch 'branch-0.19' into branch-0.19-issue-5225

da19661

# Conflicts: # cpp/benchmarks/CMakeLists.txt

rapids-bot bot merged commit f8fa481 into rapidsai:branch-0.19 Mar 2, 2021

ttnghia deleted the branch-0.19-issue-5225 branch March 11, 2021 18:33

ttnghia self-assigned this Apr 25, 2021

GregoryKimball mentioned this pull request Apr 5, 2022

[BUG] Parsing string to float is inconsistent between CSV reader and to_numeric #10599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix string to double conversion and row equivalent comparison #7410

Fix string to double conversion and row equivalent comparison #7410

ttnghia commented Feb 18, 2021 •

edited

Loading

davidwendt left a comment

kkraus14 left a comment

davidwendt left a comment

vuule left a comment

ttnghia commented Mar 2, 2021

harrism commented Mar 2, 2021

harrism commented Mar 2, 2021

ttnghia commented Mar 2, 2021

ttnghia commented Mar 2, 2021

kkraus14 commented Mar 2, 2021

Fix string to double conversion and row equivalent comparison #7410

Fix string to double conversion and row equivalent comparison #7410

Conversation

ttnghia commented Feb 18, 2021 • edited Loading

davidwendt left a comment

Choose a reason for hiding this comment

kkraus14 left a comment

Choose a reason for hiding this comment

davidwendt left a comment

Choose a reason for hiding this comment

vuule left a comment

Choose a reason for hiding this comment

ttnghia commented Mar 2, 2021

harrism commented Mar 2, 2021

harrism commented Mar 2, 2021

ttnghia commented Mar 2, 2021

ttnghia commented Mar 2, 2021

kkraus14 commented Mar 2, 2021

ttnghia commented Feb 18, 2021 •

edited

Loading