Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix floating point data generation in benchmarks #10372

Merged
merged 5 commits into from
Mar 7, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Mar 1, 2022

numeric_limits::lowest and numeric_limits::max are used as bounds for numeric type generation. However, for normal generators, bounds are shifted to [0, upper_bound - lower_bound], and the random value is shifted back by lower_bound.
with lowest and max, upper_bound - lower_bound is out of range for floats and generated values are nan and inf.
This PR halves the ranges so that upper_bound - lower_bound is still within the type range.

Expected to affect benchmarks that use floating point columns (e.g. Parquet reader benchmarks).

@vuule vuule added tests Unit testing for project cuIO cuIO issue Performance Performance related issue non-breaking Non-breaking change labels Mar 1, 2022
@vuule vuule self-assigned this Mar 1, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 1, 2022
@vuule vuule added the bug Something isn't working label Mar 1, 2022
@codecov
Copy link

codecov bot commented Mar 1, 2022

Codecov Report

Merging #10372 (e20e5fa) into branch-22.04 (a7d88cd) will increase coverage by 0.15%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10372      +/-   ##
================================================
+ Coverage         10.42%   10.58%   +0.15%     
================================================
  Files               119      125       +6     
  Lines             20603    21058     +455     
================================================
+ Hits               2148     2228      +80     
- Misses            18455    18830     +375     
Impacted Files Coverage Δ
...ython/custreamz/custreamz/tests/test_dataframes.py 99.39% <0.00%> (-0.01%) ⬇️
python/cudf/cudf/errors.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/ops.py 0.00% <0.00%> (ø)
python/cudf/cudf/datasets.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78b316c...e20e5fa. Read the comment docs.

@vuule vuule marked this pull request as ready for review March 1, 2022 19:27
@vuule vuule requested a review from a team as a code owner March 1, 2022 19:27
@vuule vuule requested review from bdice and nvdbaranec March 1, 2022 19:27
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thank you for including the comment.

@bdice
Copy link
Contributor

bdice commented Mar 1, 2022

Tests appear to be failing with an unrelated error? #10150?

ImportError: /opt/conda/envs/rapids/lib/python3.9/site-packages/cudf/_lib/text.cpython-39-x86_64-linux-gnu.so: undefined symbol:
_ZN4cudf2io4text15multibyte_splitERKNS1_17data_chunk_sourceERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalINS1_15byte_range_infoEEPN3rmm2mr22device_memory_resourceE

@vuule
Copy link
Contributor Author

vuule commented Mar 2, 2022

Tests appear to be failing with an unrelated error? #10150?

ImportError: /opt/conda/envs/rapids/lib/python3.9/site-packages/cudf/_lib/text.cpython-39-x86_64-linux-gnu.so: undefined symbol:
_ZN4cudf2io4text15multibyte_splitERKNS1_17data_chunk_sourceERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalINS1_15byte_range_infoEEPN3rmm2mr22device_memory_resourceE

I think it's a CI weirdness caused by API changes in #10150. Merged latest, that usually helps in such cases.

@vuule
Copy link
Contributor Author

vuule commented Mar 7, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7d67093 into rapidsai:branch-22.04 Mar 7, 2022
@vuule vuule deleted the bug-data_gen-limits branch March 7, 2022 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants