Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix string to double conversion and row equivalent comparison #7410

Merged
merged 24 commits into from
Mar 2, 2021

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Feb 18, 2021

This PR fixes #5225 and fixes #5731 along with several improvements. In particular:

  • Fixes the function stod in convert_floats.cu, which incorrectly uses max_mantissa, thus produces incorrect results as mentioned in the issue [BUG] discrepency in casting String to double #5225. Note that this PR still cannot produce perfectly matched results with std::atof, since handling float values near inf is very difficult.
  • Fixes the class corresponding_rows_not_equivalent which didn't handle inf and nan, thus it returned "equivalent" when comparing a valid float number with inf.
  • Adds a test case for those fixes.
  • Rewrite test cases for string=>float coversion to use std::atof results for comparison.

@ttnghia ttnghia requested review from a team as code owners February 18, 2021 17:54
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Feb 18, 2021
@ttnghia ttnghia added 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working improvement Improvement / enhancement to an existing function strings strings issues (C++ and Python) tests Unit testing for project and removed CMake CMake build issue labels Feb 18, 2021
…sue-5225

# Conflicts:
#	cpp/benchmarks/CMakeLists.txt
@github-actions github-actions bot added the CMake CMake build issue label Feb 18, 2021
@ttnghia ttnghia removed the CMake CMake build issue label Feb 18, 2021
@ttnghia ttnghia changed the title Fix string to double conversion Fix string to double conversion and row equivalent comparison Feb 18, 2021
@github-actions github-actions bot added the CMake CMake build issue label Feb 18, 2021
@davidwendt davidwendt self-requested a review February 18, 2021 18:31
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent.
Looks like you reformatted the CMakeLists.txt file.
You should change that back please.

cpp/tests/strings/floats_tests.cpp Show resolved Hide resolved
@vuule vuule removed the improvement Improvement / enhancement to an existing function label Feb 18, 2021
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmake lgtm

…sue-5225

# Conflicts:
#	cpp/benchmarks/CMakeLists.txt
@ttnghia ttnghia added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Feb 23, 2021
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for doing this.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple suggestions, mostly related to data generation.

cpp/src/strings/convert/convert_floats.cu Show resolved Hide resolved
cpp/benchmarks/string/convert_floats_benchmark.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/string/convert_floats_benchmark.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/string/convert_floats_benchmark.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/string/convert_floats_benchmark.cpp Outdated Show resolved Hide resolved
@vuule vuule self-requested a review March 1, 2021 22:09
@ttnghia
Copy link
Contributor Author

ttnghia commented Mar 2, 2021

@gpucibot merge.

@harrism
Copy link
Member

harrism commented Mar 2, 2021

@ttnghia you can't merge if CI is not passing...

@harrism
Copy link
Member

harrism commented Mar 2, 2021

@gpucibot rerun tests

@ttnghia
Copy link
Contributor Author

ttnghia commented Mar 2, 2021

I see. Thanks Mark.
It seems that there are some issues with CI accounts so the server could not run.

# Conflicts:
#	cpp/benchmarks/CMakeLists.txt
@ttnghia
Copy link
Contributor Author

ttnghia commented Mar 2, 2021

@gpucibot merge.

@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 2, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f8fa481 into rapidsai:branch-0.19 Mar 2, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 2, 2021
This is a part of #7410 and depends on it.  The java tests when they were first written, for what ever reason, did not use the correct string representation for several values including double max.  They ended up testing that the code was broken, which is not the right thing to do.  This updates those to have the correct values, but also has a comment about why the ideal value to test for overflow into `Inf` and `-Inf` is not ideal, but is a work around for small issues with the parsing code still.

Authors:
  - Robert (Bobby) Evans (@revans2)
  - Nghia Truong (@ttnghia)

Approvers:
  - Jason Lowe (@jlowe)

URL: #7473
@ttnghia ttnghia deleted the branch-0.19-issue-5225 branch March 11, 2021 18:33
@ttnghia ttnghia self-assigned this Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond 5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python) tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] expect_columns_equivalent handles infinity incorrectly. [BUG] discrepency in casting String to double
7 participants