Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for FixedSizeList type in arrow_cast, hashing #8344

Merged
merged 7 commits into from
Jan 19, 2024

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Nov 28, 2023

Which issue does this PR close?

Closes #8343

Rationale for this change

  • add hash function for 'FixedSizeList'
  • support cast between FixedSizeList and List / LargeList

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Nov 28, 2023
@Weijun-H
Copy link
Member Author

stall until the next arrow-rs (50.0)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this PR is waiting for the next arrow release (arrow 50.0.0?)

@Weijun-H
Copy link
Member Author

Weijun-H commented Dec 2, 2023

I wonder why this PR is waiting for the next arrow release (arrow 50.0.0?)

Because it just updated the support cast from List/LargeList to FixedList @alamb
https://github.com/apache/arrow-rs/blob/df69ef57d055453c399fa925ad315d19211d7ab2/arrow-cast/src/cast.rs#L808-L815

@alamb
Copy link
Contributor

alamb commented Dec 3, 2023

I see -- so while this code doesn't directly depend on the arrow release, it won't be very helpful until that support released. Makes sense to me. Thank you @Weijun-H

@alamb alamb changed the title Add support for parsing FixedSizeList type Add support for parsing FixedSizeList type in arrow_cast Dec 3, 2023
@Weijun-H Weijun-H force-pushed the cast-fixedsizelist-list branch from 37638c8 to b8b12a1 Compare January 15, 2024 11:27
@alamb
Copy link
Contributor

alamb commented Jan 15, 2024

❤️

@Weijun-H Weijun-H marked this pull request as ready for review January 15, 2024 13:24
@alamb alamb changed the title Add support for parsing FixedSizeList type in arrow_cast Add support for FixedSizeList type in arrow_cast, hashing Jan 16, 2024
@Weijun-H Weijun-H force-pushed the cast-fixedsizelist-list branch from b5ccbe2 to e6c9b34 Compare January 17, 2024 04:43
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H -- as always, your contributions are most appreciated

NULL

#TODO: arrow-rs doesn't support it yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be casting [1] (not 1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the List supports casting from UTF8 to List with a single size. Therefore, I think FixedSizeList should also support it.

select arrow_cast('1', 'LargeList(Int64)');
----
[1]

@@ -267,6 +268,38 @@ where
Ok(())
}

fn hash_fixed_list_array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any test coverage for this new code -- e.g. either unit tests for hashing or a higher level test like GROUP BY <FixedListArray>

Can you either ensure this code is tested somehow, or else perhaps move the hash support to a different PR so we can merge the arrow_cast support ?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 19, 2024
Comment on lines +586 to +609
#[test]
// Tests actual values of hashes, which are different if forcing collisions
#[cfg(not(feature = "force_hash_collisions"))]
fn create_hashes_for_fixed_size_list_arrays() {
let data = vec![
Some(vec![Some(0), Some(1), Some(2)]),
None,
Some(vec![Some(3), None, Some(5)]),
Some(vec![Some(3), None, Some(5)]),
None,
Some(vec![Some(0), Some(1), Some(2)]),
];
let list_array =
Arc::new(FixedSizeListArray::from_iter_primitive::<Int32Type, _, _>(
data, 3,
)) as ArrayRef;
let random_state = RandomState::with_seeds(0, 0, 0, 0);
let mut hashes = vec![0; list_array.len()];
create_hashes(&[list_array], &random_state, &mut hashes).unwrap();
assert_eq!(hashes[0], hashes[5]);
assert_eq!(hashes[1], hashes[4]);
assert_eq!(hashes[2], hashes[3]);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a unit test for the hash function.

@Weijun-H Weijun-H requested a review from alamb January 19, 2024 02:49
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @Weijun-H

@alamb alamb merged commit ae0f401 into apache:main Jan 19, 2024
22 checks passed
@Weijun-H Weijun-H deleted the cast-fixedsizelist-list branch January 29, 2024 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support FixedSizeList for arrow_cast
2 participants