-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for FixedSizeList type in arrow_cast
, hashing
#8344
Conversation
stall until the next arrow-rs (50.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this PR is waiting for the next arrow release (arrow 50.0.0
?)
Because it just updated the support cast from List/LargeList to FixedList @alamb |
I see -- so while this code doesn't directly depend on the arrow release, it won't be very helpful until that support released. Makes sense to me. Thank you @Weijun-H |
arrow_cast
37638c8
to
b8b12a1
Compare
❤️ |
arrow_cast
arrow_cast
, hashing
b5ccbe2
to
e6c9b34
Compare
e6c9b34
to
78e7d2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Weijun-H -- as always, your contributions are most appreciated
NULL | ||
|
||
#TODO: arrow-rs doesn't support it yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be casting [1]
(not 1
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the List supports casting from UTF8 to List with a single size. Therefore, I think FixedSizeList should also support it.
select arrow_cast('1', 'LargeList(Int64)');
----
[1]
@@ -267,6 +268,38 @@ where | |||
Ok(()) | |||
} | |||
|
|||
fn hash_fixed_list_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any test coverage for this new code -- e.g. either unit tests for hashing or a higher level test like GROUP BY <FixedListArray>
Can you either ensure this code is tested somehow, or else perhaps move the hash support to a different PR so we can merge the arrow_cast
support ?
#[test] | ||
// Tests actual values of hashes, which are different if forcing collisions | ||
#[cfg(not(feature = "force_hash_collisions"))] | ||
fn create_hashes_for_fixed_size_list_arrays() { | ||
let data = vec![ | ||
Some(vec![Some(0), Some(1), Some(2)]), | ||
None, | ||
Some(vec![Some(3), None, Some(5)]), | ||
Some(vec![Some(3), None, Some(5)]), | ||
None, | ||
Some(vec![Some(0), Some(1), Some(2)]), | ||
]; | ||
let list_array = | ||
Arc::new(FixedSizeListArray::from_iter_primitive::<Int32Type, _, _>( | ||
data, 3, | ||
)) as ArrayRef; | ||
let random_state = RandomState::with_seeds(0, 0, 0, 0); | ||
let mut hashes = vec![0; list_array.len()]; | ||
create_hashes(&[list_array], &random_state, &mut hashes).unwrap(); | ||
assert_eq!(hashes[0], hashes[5]); | ||
assert_eq!(hashes[1], hashes[4]); | ||
assert_eq!(hashes[2], hashes[3]); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a unit test for the hash function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @Weijun-H
Which issue does this PR close?
Closes #8343
Rationale for this change
FixedSizeList
andList
/LargeList
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?