-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit List Row Encoding (#5807) #5811
Conversation
I have also tested this against #5792 which integrates lists and structs into the row format and the tests pass 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the change looks good to me but I am not sure I would catch any subtle bugs as I am not an expert in this code
I suggest you also extend the fuzz test to include ListArrays to make sure we cover all corner cases.
@@ -2271,4 +2263,16 @@ mod tests { | |||
|
|||
dictionary_eq(&back[0], &array); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this test covers the issue by running this test on main and it failed like this:
thread 'tests::test_list_prefix' panicked at arrow-row/src/lib.rs:2284:9:
assertion `left == right` failed
left: Greater
right: Less
stack backtrace:
0: rust_begin_unwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:298:5
4: arrow_row::tests::test_list_prefix
at ./src/lib.rs:2284:9
5: arrow_row::tests::test_list_prefix::{{closure}}
at ./src/lib.rs:2276:26
6: core::ops::function::FnOnce::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
7: core::ops::function::FnOnce::call_once
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass `--lib`
error: 1 target failed:
`--lib`
I have this ready as part of #5792 |
Which issue does this PR close?
Closes #5807
Rationale for this change
Historically the variable length encoding used a fixed block size of 32 bytes, this made it very expensive for encoding small byte arrays. This meant we couldn't use it directly for encoding list elements, and instead came up with an alternative encoding. Unfortunately this encoding is incorrect #5807
Fortunately following #4818 the space overheads for small values using the variable length encoding is significantly reduced, and so we can just use it directly. The result is not only correct, but also typically smaller.
What changes are included in this PR?
Are there any user-facing changes?