Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for negative stripe index in ORC reader #10074

Merged
merged 2 commits into from
Jan 19, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Jan 19, 2022

Fixes CI failure

@vuule vuule added the cuIO cuIO issue label Jan 19, 2022
@vuule vuule self-assigned this Jan 19, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 19, 2022
@vuule vuule added bug Something isn't working non-breaking Non-breaking change labels Jan 19, 2022
@vuule vuule marked this pull request as ready for review January 19, 2022 00:49
@vuule vuule requested a review from a team as a code owner January 19, 2022 00:49
Comment on lines +175 to +178
CUDF_EXPECTS(
stripe_idx >= 0 and stripe_idx < static_cast<decltype(stripe_idx)>(
per_file_metadata[src_file_idx].ff.stripes.size()),
"Invalid stripe index");
Copy link
Contributor

@bdice bdice Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we could cast the auto=cudf::size_type to match the size_t that the size() function is returning, rather than play type gymnastics by casting the result of size() (requiring a decltype()) and also checking >= 0? The end result appears to be shorter and safer. (Excuse the lack of clang-format)

Suggested change
CUDF_EXPECTS(
stripe_idx >= 0 and stripe_idx < static_cast<decltype(stripe_idx)>(
per_file_metadata[src_file_idx].ff.stripes.size()),
"Invalid stripe index");
CUDF_EXPECTS(static_cast<std::size_t>(stripe_idx) < per_file_metadata[src_file_idx].ff.stripes.size(), "Invalid stripe index");

This doesn't touch the loop, so I don't think it will break the necessary change for GCC 11 in #10045.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we would be static casting to size_t a value that can be negative. IMO this option works "by accident", the explicit check here is preferable to me.

Copy link
Contributor

@bdice bdice Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a stripe index be negative? A size_type==int can be negative but that doesn't mean that stripe_idx should ever take on negative values (unless I'm unaware of how it's used). In the same way, we would need to know in the current snippet that the size() call returning a size_t won't overflow a decltype(stripe_idx)==size_type when being cast. I would argue that the safety/correctness of either option is conditional on prior knowledge of the values, and is not an intrinsic guarantee of the choice of type/casting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripe index is a value passed by the user, so we need to check if it's in valid range. It is imposible for size to be more than max size_type, since it's always (way) smaller than the number of rows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripe index is a value passed by the user

Okay, that's what I was missing. I assumed it was a loop index or something where we had stronger guarantees about its potential values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this check is just input validation, which is what the failing test is validating.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to merge this PR if it's fixing CI -- I don't to hold up CI for this suggestion so close to code freeze -- but let's discuss my suggestion further in the comment thread. edit: LGTM

@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #10074 (28506bf) into branch-22.02 (967a333) will decrease coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02   #10074      +/-   ##
================================================
- Coverage         10.49%   10.41%   -0.08%     
================================================
  Files               119      119              
  Lines             20305    20541     +236     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18402     +227     
Impacted Files Coverage Δ
python/custreamz/custreamz/kafka.py 29.16% <0.00%> (-0.63%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 92.66% <0.00%> (-0.25%) ⬇️
python/dask_cudf/dask_cudf/core.py 70.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/scalar.py 0.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e4c3dd...28506bf. Read the comment docs.

@vuule
Copy link
Contributor Author

vuule commented Jan 19, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 512e161 into rapidsai:branch-22.02 Jan 19, 2022
@vuule vuule deleted the bug-orc-stripe-idx-check branch January 19, 2022 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants