feat: Implement `ArrowArrayViewValidateFull()` #174

paleolimbot · 2023-03-28T17:33:38Z

This is required for IPC reading because corrupted offset and/or union type ID buffers could result in consumers accessing out-of-bounds elements. ArrowArrayViewSetArray() already checked the last element of offset buffers against lengths but didn't check the first element and didn't check for negative sequential offsets.

codecov-commenter · 2023-03-28T17:44:07Z

Codecov Report

❗ No coverage uploaded for pull request base (main@bebb790). Click here to learn what that means.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #174   +/-   ##
=======================================
  Coverage        ?   93.42%           
=======================================
  Files           ?        7           
  Lines           ?     1750           
  Branches        ?       54           
=======================================
  Hits            ?     1635           
  Misses          ?       84           
  Partials        ?       31

Impacted Files	Coverage Δ
src/nanoarrow/nanoarrow_types.h	`92.30% <ø> (ø)`
src/nanoarrow/array.c	`93.08% <100.00%> (ø)`
src/nanoarrow/array_inline.h	`80.76% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

lidavidm

LGTM, just a couple questions

lidavidm · 2023-03-28T20:18:39Z

src/nanoarrow/array.c

+                    "Expected element size >0 but found element size %ld at position %ld",
+                    (long)diff, (long)i);


nit: the message is kind of a non-sequitur for the error case here

I took another pass at making them all a bit better!

wjones127

Just a few small nits and questions.

wjones127 · 2023-03-28T22:02:55Z

src/nanoarrow/nanoarrow_types.h

+  /// type_id == union_type_id_map[128 + child_index]. This value may be
+  /// NULL in the case where child_id == type_id.


Is that part of the Arrow format? I couldn't find such a detail.

This is an implementation detail of the ArrowArrayView...in the spec the type ids are always there. Arguably they should always be here, too, but there's at least one test that exploits that behaviour that I discovered so I figured I should probably document it. I think the reason it ended up that way was because you can ArrowArrayViewInitFromType() and ArrowArrayViewAllocateChildren() and letting the type id map stay NULL saves some special casing of unions there.

src/nanoarrow/array.c

wjones127 · 2023-03-28T22:16:33Z

src/nanoarrow/array.c

+static int ArrowAssertInt8In(struct ArrowBufferView view, const int8_t* values,
+                             int64_t n_values, struct ArrowError* error) {
+  for (int64_t i = 0; i < view.size_bytes; i++) {
+    int item_found = 0;


Question: why int over bool?

There's not really bool in C (there is _Bool, I suppose with C99 we can use that and I guess we're assuming C99 here?)

There's #include <stdbool.h> too, which defines macros for bool, true, and false. I didn't use it initially because there are some functions that return true/false in public headers and I wanted as few includes there as possible. I should probably go through and make it consistent (I may have used char in some places).

Co-authored-by: Will Jones <[email protected]>

wjones127

Looks good!

paleolimbot added 3 commits March 28, 2023 13:35

prototype validate full

4d8bd06

symbol

d9ec2d0

check first value of offset buffer

cbc21fb

paleolimbot added 6 commits March 28, 2023 15:59

test some errors

94c1f7e

a few more tests

983ee5b

run validatefull everywhere we run setarray

4b56a05

tweak some union behaviour

b3dd2e2

test bad union type ids

7e8f42e

check union offset values

1ceac1e

paleolimbot marked this pull request as ready for review March 28, 2023 19:54

paleolimbot requested a review from lidavidm March 28, 2023 19:54

lidavidm reviewed Mar 28, 2023

View reviewed changes

wjones127 requested changes Mar 28, 2023

View reviewed changes

paleolimbot and others added 2 commits March 28, 2023 22:17

Apply suggestions from code review

a634190

Co-authored-by: Will Jones <[email protected]>

better error messages

1a8b97f

wjones127 approved these changes Mar 29, 2023

View reviewed changes

paleolimbot merged commit be3b2b1 into apache:main Mar 29, 2023

paleolimbot deleted the revisit-validation branch March 29, 2023 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement `ArrowArrayViewValidateFull()` #174

feat: Implement `ArrowArrayViewValidateFull()` #174

paleolimbot commented Mar 28, 2023 •

edited

Loading

codecov-commenter commented Mar 28, 2023 •

edited

Loading

lidavidm left a comment

lidavidm Mar 28, 2023

paleolimbot Mar 29, 2023

wjones127 left a comment

wjones127 Mar 28, 2023

paleolimbot Mar 29, 2023

wjones127 Mar 28, 2023

lidavidm Mar 28, 2023

wjones127 Mar 28, 2023

paleolimbot Mar 29, 2023

wjones127 left a comment

		"Expected element size >0 but found element size %ld at position %ld",
		(long)diff, (long)i);

		/// type_id == union_type_id_map[128 + child_index]. This value may be
		/// NULL in the case where child_id == type_id.

feat: Implement ArrowArrayViewValidateFull() #174

feat: Implement ArrowArrayViewValidateFull() #174

Conversation

paleolimbot commented Mar 28, 2023 • edited Loading

codecov-commenter commented Mar 28, 2023 • edited Loading

Codecov Report

lidavidm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

feat: Implement `ArrowArrayViewValidateFull()` #174

feat: Implement `ArrowArrayViewValidateFull()` #174

paleolimbot commented Mar 28, 2023 •

edited

Loading

codecov-commenter commented Mar 28, 2023 •

edited

Loading