chore: remove panics in datafusion-common::scalar by making more operations return `Result` #7901

junjunjd · 2023-10-22T08:39:22Z

Which issue does this PR close?

It removes the majority of panics in datafusion-common::scalar #3313.

Rationale for this change

Important move towards closing #3313
Closes #3313

What changes are included in this PR?

Replace most of the panics in datafusion-common::scalar by internal_err, not_impl_err or other DataFusionError variants

Are these changes tested?

Yes

Are there any user-facing changes?

No

alamb · 2023-10-23T21:22:01Z

The CI appears to be failing. Marking as draft until they are passing. If there are specific questions about this PR, please let us know.

alamb · 2023-10-23T21:22:21Z

Thank you for the work @junjunjd

junjunjd · 2023-10-25T07:57:18Z

@alamb CI is fixed. This MR is ready for final review.

Weijun-H

Thanks @junjunjd 👍

datafusion/common/src/scalar.rs

comphead

Epic work @junjunjd thanks
I'm thinking if we can get rid of .expect ? Or at least provide more details in expect message?

junjunjd · 2023-10-26T05:12:38Z

Epic work @junjunjd thanks I'm thinking if we can get rid of .expect ? Or at least provide more details in expect message?

I removed the .expect in get_min_max_values and get_null_count_values macros.
The rest of the .expect calls exist in tests or examples. It makes sense to use .expect and panic there. The backtrace from a panic as well as the message provided by .expect provide more information on the failure than Result. The Rust book suggests calling unwrap or expect in tests/examples https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html

junjunjd · 2023-10-26T06:14:12Z

@alamb @Weijun-H @comphead I addressed the comments. This is ready for another review.

Weijun-H

~~Could we get rid of .expect in tests 🤔?~~

datafusion/common/src/scalar.rs

houqp · 2023-10-26T16:42:50Z

@Weijun-H @comphead using unwrap and expect in tests is actually the preferred practice, see https://github.com/influxdata/influxdb/blob/main/docs/style_guide.md#dont-return-result-from-test-functions. It makes the test failure easier to parse for a human and the test framework will already provide all the necessary context on failure.

Weijun-H

LGTM! Thanks @junjunjd

alamb

Thank you for this PR @junjunjd. We very much appreciate the effort -- I am sorry that the ticket description may have been misleading

Buried in #3313, it says

The goal is not to remove all panics but review and make sure we are using them appropriately. Bonus points for adding documentation for invariants.

Can you explain why you removed the panics that you did? I think most of them are "unreachable" so forcing client code to check for errors that will never happen makes it harder to work with (and is why this PR adds around 500 new lines of code)

junjunjd · 2023-11-01T04:11:34Z

datafusion/common/src/scalar.rs

@@ -330,9 +330,9 @@ impl PartialOrd for ScalarValue {
                        let arr2 = list_arr2.value(i);

                        let lt_res =
-                            arrow::compute::kernels::cmp::lt(&arr1, &arr2).unwrap();
+                            arrow::compute::kernels::cmp::lt(&arr1, &arr2).ok()?;


This panic is reachable, for example if arr1 and arr2 have different data type, arrow::compute::kernels::cmp::lt will panic.
I think it makes sense to return None here instead of panicking and exiting since user just performs a partial order comparison.
This does not require any code change in client side.

The potential downside of returning None rather than panic'ing is that it may mask a real bug and make it harder to track down -- comparing scalars of different types likely means they should have been coerced before

datafusion/common/src/scalar.rs

junjunjd · 2023-11-03T07:28:13Z

datafusion/common/src/scalar.rs

@@ -1970,13 +2020,14 @@ impl ScalarValue {
                ),
            },
            ScalarValue::Fixedsizelist(..) => {
-                unimplemented!("FixedSizeList is not supported yet")


@alamb What would be the preferred way to handle unimplemented errors in datafusion? There are many places where a NotImplemented error is returned instead of using unimplemented! and panicking. IMO returning an error makes more sense as user can choose to ignore unimplemented errors instead of panicking and exiting.

I agree returning NotYetImplemented is a better choice

datafusion/common/src/scalar.rs

junjunjd · 2023-11-03T08:37:55Z

@alamb Thanks for the review. The majority of the added lines should be caused by line wrap reformat in tests. I have added comments to the panics I removed in scalar.rs. To summarize, these panics can be categorized into five types in general:

panics generated in iter_to_array when the ScalarValues in the iterator are not all the same type. I believe these errors are reachable. Most of the build_* macros defined in the function return an internal error instead of panicking. IMO it makes sense to remove these panics to align with other internal errors returned.
typed_cast_* macros called in try_from_array. Since try_from_array is a public function, downcasting the array to certain types can fail depending on what array value the user passes to the function.
I think it makes sense to return an internal error as this error is reachable and recoverable. This aligns with how downcast error is handled in the downcast_value macro. try_from_array already returns a Result, so this does not require any change in client code.
panics generated when with_precision_and_scale is called on decimal arrays. I think these errors are reachable because ScalarValue::try_new_decimal128 allows decimals with precision 0 while arrow-array does not support that. We can update try_new_decimal128 to disallow decimal with precision 0 and establish some other invariants to new_list, eq_array and to_array_of_size so that these error becomes unreachable and these functions can panic. This should remove the impacts on client code.
the unimplemented errors. In many other datafusion code, a NotImplemented error is returned instead of using unimplemented!. IMHO returning an error makes more sense as user can choose whether to ignore the unimplemented errors instead of panicking and exiting. Would appreciate your thoughts on this.
All the ArrowErrors and the "Invalid dictionary keys type" errors should be unreachable. I will change these back to panics.

alamb

First of all, thank you to @junjunjd for investing so much time, not just in the code but also evaluating the implications of the changes.

In general I think there are tradeoffs between panic and returning Errs -- specifically:

Panic's are not as user friendly, but they stop computation immediately when something "unexpected" happens and thus are often easier to debug and locate the problem
Errs are more user friendly, and can return messages that may help users workaround/fix whatever is wrong.

I realize there is a judgement call required to decide if something is "expected" or not and how much information users can get from error messages vs panics and weighing off a better user experience vs code that is more efficient to debug

I also realize the existing DataFusion codebase is not consistent in its handling of panics and errors.

On the balance I think this PR is an improvement to the code, and therefore I think it could be merged. Users of the affected APIs can simply unwrap the Result and get the same panic behavior as before.

I think it would be ok not to merge it too if other reviewers feel strongly in the other direction.

alamb · 2023-11-03T14:29:42Z

datafusion/common/src/scalar.rs

@@ -330,9 +330,9 @@ impl PartialOrd for ScalarValue {
                        let arr2 = list_arr2.value(i);

                        let lt_res =
-                            arrow::compute::kernels::cmp::lt(&arr1, &arr2).unwrap();
+                            arrow::compute::kernels::cmp::lt(&arr1, &arr2).ok()?;


The potential downside of returning None rather than panic'ing is that it may mask a real bug and make it harder to track down -- comparing scalars of different types likely means they should have been coerced before

alamb · 2023-11-03T15:06:33Z

datafusion/common/src/scalar.rs

@@ -1970,13 +2020,14 @@ impl ScalarValue {
                ),
            },
            ScalarValue::Fixedsizelist(..) => {
-                unimplemented!("FixedSizeList is not supported yet")


I agree returning NotYetImplemented is a better choice

alamb · 2023-11-07T14:15:36Z

@junjunjd can you please merge and resolve the conflicts in this PR? Then we can merge it in

alamb · 2023-11-09T17:46:49Z

Marking as draft as it needs conflicts resolved prior to being mergable.

junjunjd · 2023-11-11T08:15:50Z

@alamb Thank you for the review! I rebased the MR. It is ready for final review/merge.

alamb

Thank you @junjunjd

This reverts commit e642cc2.

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Oct 22, 2023

alamb marked this pull request as draft October 23, 2023 21:22

junjunjd force-pushed the chore/remove-panics-in-datafusion-common branch 2 times, most recently from 863ed9a to 4b61807 Compare October 25, 2023 07:05

junjunjd marked this pull request as ready for review October 25, 2023 07:55

Weijun-H reviewed Oct 25, 2023

View reviewed changes

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

comphead reviewed Oct 25, 2023

View reviewed changes

junjunjd force-pushed the chore/remove-panics-in-datafusion-common branch from 4b61807 to 60d1543 Compare October 26, 2023 05:21

Weijun-H requested changes Oct 26, 2023

View reviewed changes

Weijun-H approved these changes Oct 27, 2023

View reviewed changes

alamb reviewed Oct 30, 2023

View reviewed changes

junjunjd commented Nov 1, 2023

View reviewed changes