Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional testing for unwrap_cast_in_comparison #4147

Merged
merged 1 commit into from
Nov 9, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 8, 2022

Which issue does this PR close?

re #3938 and #3702

(I am breaking up the work in several PRs to make it easier to review)

Rationale for this change

  1. I want to be able to better test adding new data type support to unwrap cast rule
  2. I want to make sure the coerce rule has the same behavior as the arrow cast kernels

What changes are included in this PR?

  1. add new tests for unwrap_cast_in_comparison

Are these changes tested?

All tests!

Are there any user-facing changes?

No

@github-actions github-actions bot added the optimizer Optimizer rules label Nov 8, 2022
Comment on lines +810 to +838
// Verify that calling the arrow
// cast kernel yields the same results
// input array
let literal_array = literal.to_array_of_size(1);
let expected_array = expected_value.to_array_of_size(1);
let cast_array = cast_with_options(
&literal_array,
&target_type,
&CastOptions { safe: true },
)
.expect("Expected to be cast array with arrow cast kernel");

assert_eq!(
&expected_array, &cast_array,
"Result of casing {:?} with arrow was\n {:#?}\nbut expected\n{:#?}",
literal, cast_array, expected_array
);

// Verify that for timestamp types the timezones are the same
// (ScalarValue::cmp doesn't account for timezones);
if let (
DataType::Timestamp(left_unit, left_tz),
DataType::Timestamp(right_unit, right_tz),
) = (actual_value.get_datatype(), expected_value.get_datatype())
{
assert_eq!(left_unit, right_unit);
assert_eq!(left_tz, right_tz);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this coverage in particular to ensure that the casting we implement in unwrap cast is consistent with the arrow cast kernel -- this is especially important for timestamps (see #3938)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool, double verification 💯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this coverage in particular to ensure that the casting we implement in unwrap cast is consistent with the arrow cast kernel -- this is especially important for timestamps (see #3938)

thanks @alamb
I know the timestamp is most important type in the time series database.
Optimization about the type of time will benefit the InfluxIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the timestamp is most important type in the time series database.
Optimization about the type of time will benefit the InfluxIO

Yes, indeed this is why I care so much about timestamps :)


#[test]
fn test_try_cast_to_type_nulls() {
// test values that can be cast to/from all integer types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit confusing that this description is the same as the one below, so might make sense to actually emphasize this is only testing null values for integer types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit confusing that this description is the same as the one below, so might make sense to actually emphasize this is only testing null values for integer types.

agree, from the test case i got your mean.
You want to test that the from value is Null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy/pasted comment 🤦 -- I will fix it in the next PR. Sorry for the confusion and nice spot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #4148


#[test]
fn test_try_cast_to_type_int_out_of_range() {
let max_i64 = ScalarValue::Int64(Some(i64::MAX));
Copy link
Contributor

@isidentical isidentical Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb if I didn't miss any, I guess it makes sense to also include some signed -> unsigned conversions here as well. Seems like unsigned types are not supported yet. Can be ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If want to cover this case maybe adding the test then notate it with #[should_panic] is a choice. Then we won't miss this coverage when "signed <-> unsigned" conversions get implemented in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should support unsigned data type in the rule

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah- as @liukun4515 mentions I have a PR queued up to support the unsigned case :)

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it! This is very concrete 👍

}

#[test]
fn test_try_decimal_cast_out_of_range() {
Copy link
Contributor

@liukun4515 liukun4515 Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I have tested unsupported cases in test_not_unwrap_cast_with_decimal_comparison, but it's good to simplify the test and add more test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree you have covered most (if not all ) of these cases already in test_not_unwrap_cast_with_decimal_comparison

The reason I want to add redundant coverage was while write tests for unwrapping timestamp casts I also wanted to verify they were consistent with the cast kernel which I couldn't figure out how to do easily.

Also, the tests as written tested both IN list cast removal as well as comparison cast removal, but the code that handles those structures is the same. Thus adding tests for unwrapping timestamps for both of those forms seemed like it didn't add extra coverage and was overly verbose and got even worse as I contemplated testing all the other types.

So since the only code that is different for different data types was try_cast_to_type I figured I would write a test for that function to get enough coverage

// input array
let literal_array = literal.to_array_of_size(1);
let expected_array = expected_value.to_array_of_size(1);
let cast_array = cast_with_options(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great test for the cast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad I had it too 😅 as I rediscovered that the arrow cast kernel doesn't support decimal <--> unsigned and so my initial implementation to support unsigned would have been inconsistent.

@alamb
Copy link
Contributor Author

alamb commented Nov 9, 2022

Thank you all for the comments

@alamb alamb merged commit 8f67d05 into apache:master Nov 9, 2022
@alamb alamb deleted the alamb/unwrap_casts_tests branch November 9, 2022 18:29
@ursabot
Copy link

ursabot commented Nov 9, 2022

Benchmark runs are scheduled for baseline = 9692fb0 and contender = 8f67d05. 8f67d05 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants