Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement ArrowArrayViewValidateFull() #174

Merged
merged 11 commits into from
Mar 29, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Mar 28, 2023

This is required for IPC reading because corrupted offset and/or union type ID buffers could result in consumers accessing out-of-bounds elements. ArrowArrayViewSetArray() already checked the last element of offset buffers against lengths but didn't check the first element and didn't check for negative sequential offsets.

@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@bebb790). Click here to learn what that means.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #174   +/-   ##
=======================================
  Coverage        ?   93.42%           
=======================================
  Files           ?        7           
  Lines           ?     1750           
  Branches        ?       54           
=======================================
  Hits            ?     1635           
  Misses          ?       84           
  Partials        ?       31           
Impacted Files Coverage Δ
src/nanoarrow/nanoarrow_types.h 92.30% <ø> (ø)
src/nanoarrow/array.c 93.08% <100.00%> (ø)
src/nanoarrow/array_inline.h 80.76% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@paleolimbot paleolimbot marked this pull request as ready for review March 28, 2023 19:54
@paleolimbot paleolimbot requested a review from lidavidm March 28, 2023 19:54
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple questions

Comment on lines 818 to 819
"Expected element size >0 but found element size %ld at position %ld",
(long)diff, (long)i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the message is kind of a non-sequitur for the error case here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another pass at making them all a bit better!

Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small nits and questions.

Comment on lines +506 to +507
/// type_id == union_type_id_map[128 + child_index]. This value may be
/// NULL in the case where child_id == type_id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that part of the Arrow format? I couldn't find such a detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation detail of the ArrowArrayView...in the spec the type ids are always there. Arguably they should always be here, too, but there's at least one test that exploits that behaviour that I discovered so I figured I should probably document it. I think the reason it ended up that way was because you can ArrowArrayViewInitFromType() and ArrowArrayViewAllocateChildren() and letting the type id map stay NULL saves some special casing of unions there.

static int ArrowAssertInt8In(struct ArrowBufferView view, const int8_t* values,
int64_t n_values, struct ArrowError* error) {
for (int64_t i = 0; i < view.size_bytes; i++) {
int item_found = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why int over bool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not really bool in C (there is _Bool, I suppose with C99 we can use that and I guess we're assuming C99 here?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's #include <stdbool.h> too, which defines macros for bool, true, and false. I didn't use it initially because there are some functions that return true/false in public headers and I wanted as few includes there as possible. I should probably go through and make it consistent (I may have used char in some places).

Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@paleolimbot paleolimbot merged commit be3b2b1 into apache:main Mar 29, 2023
@paleolimbot paleolimbot deleted the revisit-validation branch March 29, 2023 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants