-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ScalarValue::{List, LargeList, FixedSizedList}
to take specific types rather than ArrayRef
#8562
Conversation
Also ScalarValue::LargeList and ScalarValue::FixedSizeList
datafusion/common/src/scalar.rs
Outdated
@@ -3300,33 +3409,6 @@ mod tests { | |||
assert_eq!(result, &expected); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ScalarValue::eq_array test for List elsewhere? Otherwise, we can add the test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it because it was comparing a ListArray
with a LargeListArray
, and thus was no longer compatible with ScalarValue::List
. But I can add this test back and just do two ListArray
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
ScalarValue::{List, LargeList, FixedSizedList}
to take specific types rather than ArrayRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @rspears74 👍
datafusion/common/src/scalar.rs
Outdated
write!(f, "{value_formatter}")? | ||
} | ||
ScalarValue::LargeList(arr) => { | ||
// ScalarValue List should always have a single element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ScalarValue List should always have a single element | |
// ScalarValue LargeList should always have a single element |
datafusion/common/src/scalar.rs
Outdated
let value_formatter = formatter.value(0); | ||
write!(f, "{value_formatter}")? | ||
} | ||
ScalarValue::FixedSizeList(arr) => { | ||
// ScalarValue List should always have a single element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ScalarValue List should always have a single element | |
// ScalarValue FixedSizeList should always have a single element |
arr.to_owned() as ArrayRef, | ||
)]) | ||
.map_err(|e| { | ||
Error::General( format!("Error creating temporary batch while encoding ScalarValue::List: {e}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error::General( format!("Error creating temporary batch while encoding ScalarValue::List: {e}")) | |
Error::General( format!("Error creating temporary batch while encoding ScalarValue::LargeList: {e}")) |
arr.to_owned() as ArrayRef, | ||
)]) | ||
.map_err(|e| { | ||
Error::General( format!("Error creating temporary batch while encoding ScalarValue::List: {e}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error::General( format!("Error creating temporary batch while encoding ScalarValue::List: {e}")) | |
Error::General( format!("Error creating temporary batch while encoding ScalarValue::FixedSizeList: {e}")) |
I will try and find time to review this PR later this week . Sorry for the delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR @rspears74 - I think it is looking quite close. There are a few places where code gets repeated 3 times that I think we can avoid by refactoring code into specific functions. I left some more specific suggestions below.
Also thank you @jayzhan211 and @Weijun-H for the reviews
)), | ||
}) | ||
} | ||
ScalarValue::LargeList(arr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to avoid the copy/paste in this method by making a function? Something like the following?
fn encode_scalar_list_value(array: &Array) -> Result<protobuf::ScalarListValue> {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was easy enough to implement.
datafusion/common/src/scalar.rs
Outdated
arr.value(0) | ||
} else { | ||
unreachable!("Since only List / LargeList / FixedSizeList are supported, this should never happen") | ||
fn first_array_for_list(arr: &Arc<ListArray>) -> ArrayRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really nice here to avoid the redundant copy/paste code, perhaps by making a function with signature like this
fn cmp_list_array(arr1: &dyn Array, arr2: &dyn Array) -> Option<Ordering> {
..
}
Which then you can call with Arc<ListArray>
and Arc<LargeListArray>
types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a little tougher, because if the args to the function are "generic" (i.e. ArrayRef instead of a specific Arc<ListArray>
type), the arr1.value(0)
method is not available. The first_array_for_list
function needs to be able to convert the ArrayRef
into the specific type to then get the value out. So I'm struggling a bit coming up with a way here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a proposal of how to do it: rspears74#1 (that targets this PR)
Welp, I had a merge conflict, and upon fixing it, there's some new code that's incompatible with these changes (not compiling), so I'm gonna have to do a little more involved work to fix those. Will try to make that happen ASAP. |
Alright that wasn't so bad. Fixed and passing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rspears74 -- I think this is looking good.
cc @wjones127 since I think you are interested in FixedSizedList in general
I think there is still some repetition that could be avoided (I left my suggestions on how to do so) but we could potentially do that as a follow on PR as well
datafusion/common/src/scalar.rs
Outdated
arr.value(0) | ||
} else { | ||
unreachable!("Since only List / LargeList / FixedSizeList are supported, this should never happen") | ||
fn first_array_for_list(arr: &Arc<ListArray>) -> ArrayRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a proposal of how to do it: rspears74#1 (that targets this PR)
datafusion/common/src/scalar.rs
Outdated
// ScalarValue List should always have a single element | ||
assert_eq!(arr.len(), 1); | ||
let options = FormatOptions::default().with_display_error(true); | ||
let formatter = ArrayFormatter::try_new(arr, &options).unwrap(); | ||
let formatter = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same pattern could be applied here as in rspears74#1 (use first_array_for_list
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care of this one too. I went through my changes looking for any larger duplication sections but missed this one.
Remove redundant partial_cmp implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems acceptable. Like Andrew said I think we could avoid some repetition.
I do think it's worth quickly considering whether wrapping in Scalar might be useful, as right now there isn't any strong enforcement that the array values in these scalars is length 1.
@@ -142,13 +143,13 @@ pub enum ScalarValue { | |||
/// Fixed size list scalar. | |||
/// | |||
/// The array must be a FixedSizeListArray with length 1. | |||
FixedSizeList(ArrayRef), | |||
FixedSizeList(Arc<FixedSizeListArray>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I'm coming in a little late, but have we considered instead making the type Scalar<Arc<FixedSizeListArray>>
? (Scalar being the one in arrow-rs) This would enforce that the length is 1, which might be nice. I could also understand if that makes it too awkward to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I like the enforcement of length 1 (I mentioned this problem initially in the issue I opened), but it does seem to be a bit cumbersome to deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file a follow on ticket for continuing improving the representation of list constants
Not sure why the MSRV check is failing? |
You may need to merge up from main -- I think this was fixed on main recently |
I merged up from main to resolve a merge conflict. Thank you everyone |
Thanks @alamb @jayzhan211 @wjones127 for working through this with me! |
Which issue does this PR close?
Closes #8472
Rationale for this change
It's much preferable to constrain
ScalarValue::List
to accept only anArc<ListArray>
rather than anyArrayRef
and fail randomly. Similar forScalarValue::LargeList
andScalarValue::FixedSizeList
.What changes are included in this PR?
Changed the type signature of
ScalarValue::List
,ScalarValue::LargeList
, andScalarValue::FixedSizedList
and any cascading changes that follow from this. One test compared aListArray
passed toScalarValue::List
to aLargeListArray
passed toScalarValue::List
- becauseScalarValue::List
only acceptsListArray
now, this test was changed to compare twoListArray
s instead.Are these changes tested?
Tests have been updated where needed.
Are there any user-facing changes?
Users are now only allowed to pass a
ListArray
toScalarValue::List
, aLargeListArray
toScalarValue::LargeList
, and aFixedSizeListArray
toScalarValue::FixedSizeList
.