Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exceptions instead of return values to handle errors in CompactProtocolReader #14582

Merged
merged 20 commits into from
Dec 14, 2023

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Dec 6, 2023

Description

Align the coding style with the rest of libcudf and use exceptions to report errors instead of returning a bool.
This simplifies the control flow in many functions. In addition, the return value used top be ignored at many call sites.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added cuIO cuIO issue tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 6, 2023
@vuule vuule self-assigned this Dec 6, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 6, 2023
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this. Next up snake_case names! 😄

Here are a few unasked for comments.

cpp/src/io/parquet/compact_protocol_reader.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/compact_protocol_reader.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/compact_protocol_reader.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/compact_protocol_reader.cpp Outdated Show resolved Hide resolved
Comment on lines -521 to -522
bool const exit_function = FunctionSwitchImpl<index>::run(cpr, field_type, field, op);
if (exit_function) { return false; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the logic inverts and false signals failure. This logic is(was) used from here to the top-level read functions.

@vuule vuule marked this pull request as ready for review December 8, 2023 22:56
@vuule vuule requested a review from a team as a code owner December 8, 2023 22:56
@vuule vuule requested a review from robertmaynard December 13, 2023 00:08

void assert_field_type(int type, FieldType expected)
{
CUDF_EXPECTS(type == expected,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, this is comparing int vs enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always compared ints and FieldType in this code, e.g. if (field_type != ST_FLD_LIST) { return true; }. I think it's fine, given that we throw if the values are not equal - no int values that are not valid FieldType values will get past this check. We have to sanitize the input somewhere 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to refactor it, changing into enum class for FieldType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following. We have to compare the int input with a FieldType value somewhere. Using an enum class helps with the scope of the value names, but we still need the int <-> enum comparison, just with a static_cast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we will do an explicit static_cast before comparison. I think it is better than this implicit comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, FieldType is not exactly part of the spec, we have some creative freedom around it.
If you don't mind, let's make it an enum class in a separate PR. It's a wider change than the error handling this PR is about.

Comment on lines +87 to +89
inline void operator()(CompactProtocolReader* cpr, int field_type)
{
if (field_type != ST_FLD_LIST) { return true; }
assert_field_type(field_type, ST_FLD_LIST);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we are wrong from the beginning. Why not field_type an enum value, instead of int?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, should the caller handle the input type instead of checking it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field_type is obtained by the caller by parsing the encoded input, so it could be any value if the input file is garbage. We cannot assume it will be a valid FieldType.

As for the validation on the caller side - the caller only sees these parquet_field_xyz objects as functors and does not know the actual field type that is being read. We could move validation to the caller if parquet_field classes somehow expose the expected field type, but I don't think this is a better option.

@vuule
Copy link
Contributor Author

vuule commented Dec 14, 2023

/merge

@rapids-bot rapids-bot bot merged commit 899e94b into rapidsai:branch-24.02 Dec 14, 2023
67 checks passed
@vuule vuule deleted the impr-no-bool-return-error branch December 14, 2023 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants