Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for negative stripe index in ORC reader #10074

Merged
merged 2 commits into from
Jan 19, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions cpp/src/io/orc/aggregate_orc_metadata.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,10 @@ std::vector<metadata::stripe_source_mapping> aggregate_orc_metadata::select_stri
// Coalesce stripe info at the source file later since that makes downstream processing much
// easier in impl::read
for (const auto& stripe_idx : user_specified_stripes[src_file_idx]) {
CUDF_EXPECTS(stripe_idx < static_cast<decltype(stripe_idx)>(
per_file_metadata[src_file_idx].ff.stripes.size()),
"Invalid stripe index");
CUDF_EXPECTS(
stripe_idx >= 0 and stripe_idx < static_cast<decltype(stripe_idx)>(
per_file_metadata[src_file_idx].ff.stripes.size()),
"Invalid stripe index");
Comment on lines +175 to +178
Copy link
Contributor

@bdice bdice Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we could cast the auto=cudf::size_type to match the size_t that the size() function is returning, rather than play type gymnastics by casting the result of size() (requiring a decltype()) and also checking >= 0? The end result appears to be shorter and safer. (Excuse the lack of clang-format)

Suggested change
CUDF_EXPECTS(
stripe_idx >= 0 and stripe_idx < static_cast<decltype(stripe_idx)>(
per_file_metadata[src_file_idx].ff.stripes.size()),
"Invalid stripe index");
CUDF_EXPECTS(static_cast<std::size_t>(stripe_idx) < per_file_metadata[src_file_idx].ff.stripes.size(), "Invalid stripe index");

This doesn't touch the loop, so I don't think it will break the necessary change for GCC 11 in #10045.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we would be static casting to size_t a value that can be negative. IMO this option works "by accident", the explicit check here is preferable to me.

Copy link
Contributor

@bdice bdice Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a stripe index be negative? A size_type==int can be negative but that doesn't mean that stripe_idx should ever take on negative values (unless I'm unaware of how it's used). In the same way, we would need to know in the current snippet that the size() call returning a size_t won't overflow a decltype(stripe_idx)==size_type when being cast. I would argue that the safety/correctness of either option is conditional on prior knowledge of the values, and is not an intrinsic guarantee of the choice of type/casting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripe index is a value passed by the user, so we need to check if it's in valid range. It is imposible for size to be more than max size_type, since it's always (way) smaller than the number of rows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripe index is a value passed by the user

Okay, that's what I was missing. I assumed it was a loop index or something where we had stronger guarantees about its potential values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this check is just input validation, which is what the failing test is validating.

stripe_infos.push_back(
std::make_pair(&per_file_metadata[src_file_idx].ff.stripes[stripe_idx], nullptr));
row_count += per_file_metadata[src_file_idx].ff.stripes[stripe_idx].numberOfRows;
Expand Down