Fix floating point data generation in benchmarks #10372

vuule · 2022-03-01T02:39:44Z

numeric_limits::lowest and numeric_limits::max are used as bounds for numeric type generation. However, for normal generators, bounds are shifted to [0, upper_bound - lower_bound], and the random value is shifted back by lower_bound.
with lowest and max, upper_bound - lower_bound is out of range for floats and generated values are nan and inf.
This PR halves the ranges so that upper_bound - lower_bound is still within the type range.

Expected to affect benchmarks that use floating point columns (e.g. Parquet reader benchmarks).

…und`

codecov · 2022-03-01T07:46:30Z

Codecov Report

Merging #10372 (e20e5fa) into branch-22.04 (a7d88cd) will increase coverage by 0.15%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.04   #10372      +/-   ##
================================================
+ Coverage         10.42%   10.58%   +0.15%     
================================================
  Files               119      125       +6     
  Lines             20603    21058     +455     
================================================
+ Hits               2148     2228      +80     
- Misses            18455    18830     +375

Impacted Files	Coverage Δ
...ython/custreamz/custreamz/tests/test_dataframes.py	`99.39% <0.00%> (-0.01%)`	⬇️
python/cudf/cudf/errors.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/ops.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/datasets.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/parquet.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/series.py	`0.00% <0.00%> (ø)`
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78b316c...e20e5fa. Read the comment docs.

bdice

This looks good. Thank you for including the comment.

bdice · 2022-03-01T21:24:47Z

Tests appear to be failing with an unrelated error? #10150?

ImportError: /opt/conda/envs/rapids/lib/python3.9/site-packages/cudf/_lib/text.cpython-39-x86_64-linux-gnu.so: undefined symbol:
_ZN4cudf2io4text15multibyte_splitERKNS1_17data_chunk_sourceERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalINS1_15byte_range_infoEEPN3rmm2mr22device_memory_resourceE

…bug-data_gen-limits

…nto bug-data_gen-limits

vuule · 2022-03-02T01:05:18Z

Tests appear to be failing with an unrelated error? #10150?

ImportError: /opt/conda/envs/rapids/lib/python3.9/site-packages/cudf/_lib/text.cpython-39-x86_64-linux-gnu.so: undefined symbol:
_ZN4cudf2io4text15multibyte_splitERKNS1_17data_chunk_sourceERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalINS1_15byte_range_infoEEPN3rmm2mr22device_memory_resourceE

I think it's a CI weirdness caused by API changes in #10150. Merged latest, that usually helps in such cases.

vuule · 2022-03-07T22:11:59Z

@gpucibot merge

reduce random data range to avoid overflow in `upper_bound - lower_bo…

eaf5ff8

…und`

vuule added tests Unit testing for project cuIO cuIO issue Performance Performance related issue non-breaking Non-breaking change labels Mar 1, 2022

vuule self-assigned this Mar 1, 2022

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 1, 2022

vuule added the bug Something isn't working label Mar 1, 2022

style

4ee7037

add comment

6af58ba

vuule marked this pull request as ready for review March 1, 2022 19:27

vuule requested a review from a team as a code owner March 1, 2022 19:27

vuule requested review from bdice and nvdbaranec March 1, 2022 19:27

bdice approved these changes Mar 1, 2022

View reviewed changes

vuule added 2 commits March 1, 2022 17:03

Merge branch 'branch-22.04' of https://github.com/rapidsai/cudf into …

7aa752d

…bug-data_gen-limits

Merge branch 'bug-data_gen-limits' of https://github.com/vuule/cudf i…

e20e5fa

…nto bug-data_gen-limits

davidwendt mentioned this pull request Mar 2, 2022

generate benchmark input in device #10109

Merged

nvdbaranec approved these changes Mar 7, 2022

View reviewed changes

rapids-bot bot merged commit 7d67093 into rapidsai:branch-22.04 Mar 7, 2022

vuule deleted the bug-data_gen-limits branch March 7, 2022 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix floating point data generation in benchmarks #10372

Fix floating point data generation in benchmarks #10372

vuule commented Mar 1, 2022

codecov bot commented Mar 1, 2022 •

edited

Loading

bdice left a comment

bdice commented Mar 1, 2022

vuule commented Mar 2, 2022

vuule commented Mar 7, 2022

Fix floating point data generation in benchmarks #10372

Fix floating point data generation in benchmarks #10372

Conversation

vuule commented Mar 1, 2022

codecov bot commented Mar 1, 2022 • edited Loading

Codecov Report

bdice left a comment

Choose a reason for hiding this comment

bdice commented Mar 1, 2022

vuule commented Mar 2, 2022

vuule commented Mar 7, 2022

codecov bot commented Mar 1, 2022 •

edited

Loading