Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add file/SQL level test for pruning parquet row group with decimal data type. #2977

Merged
merged 4 commits into from
Jul 30, 2022

Conversation

liukun4515
Copy link
Contributor

@liukun4515 liukun4515 commented Jul 27, 2022

Which issue does this PR close?

This PR just add more test cases for pruning row group with the decimal data type in parquet file.

part of #2962

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jul 27, 2022
@liukun4515 liukun4515 marked this pull request as ready for review July 28, 2022 02:55
@liukun4515 liukun4515 changed the title [WIP] test: add file/SQL level test for pruning parquet row group with decimal data type. test: add file/SQL level test for pruning parquet row group with decimal data type. Jul 28, 2022
@liukun4515 liukun4515 force-pushed the prune_decimal_parquet_file branch from 8eef9f3 to 2f88811 Compare July 28, 2022 05:14
@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2022

Codecov Report

Merging #2977 (095fb62) into master (176f432) will increase coverage by 0.02%.
The diff coverage is 98.46%.

@@            Coverage Diff             @@
##           master    #2977      +/-   ##
==========================================
+ Coverage   85.75%   85.78%   +0.02%     
==========================================
  Files         281      281              
  Lines       51494    51559      +65     
==========================================
+ Hits        44161    44228      +67     
+ Misses       7333     7331       -2     
Impacted Files Coverage Δ
datafusion/core/tests/parquet_pruning.rs 98.29% <98.46%> (+0.03%) ⬆️
datafusion/expr/src/logical_plan/plan.rs 77.95% <0.00%> (+0.52%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@mingmwang
Copy link
Contributor

Can you please also add test cases to cover the decimal type with large precisions? For example decimal(18 ,0), decimal (38, 0). Maybe the UT will fail but once the getStats methods is fixed for binary array encoded decimal, the UT should pass.

@liukun4515
Copy link
Contributor Author

Can you please also add test cases to cover the decimal type with large precisions? For example decimal(18 ,0), decimal (38, 0). Maybe the UT will fail but once the getStats methods is fixed for binary array encoded decimal, the UT should pass.

thanks for your suggestions, and will add test for larger precision of decimal which will be stored as fixed_length_byte_array in the parquet.

@liukun4515
Copy link
Contributor Author

add more test case for other precision.

@liukun4515 liukun4515 force-pushed the prune_decimal_parquet_file branch from 095fb62 to 0f8f7e2 Compare July 28, 2022 11:16
@@ -449,6 +450,154 @@ async fn prune_int32_eq_in_list_negated() {
assert_eq!(output.result_rows, 19, "{}", output.description());
}

async fn test_prune_decimal(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inspired by this, I will collation other test case and make the test cases cleanly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify what you mean by this comment?

I didn't quite understand if you planned to do this combination as part of this PR or if you planned to do it as a follow on PR.

In other words, is this PR ready for review, or do you plan more work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confused comments.
When I work on the decimal test, and find all the pruning test has the common logic.
I can collation them in the next pr.

async test_prune(
    case_type: Scenario,
    sql: &str,
    expected_errors: Option<usize>,
    expected_row_group_pruned: Option<usize>,
    expected_results: usize) -> {}

All the test cases such as prune_f64_lt,prune_i64_lt can use test_prune to replace the repeated logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up pr: #2986

@@ -449,6 +450,154 @@ async fn prune_int32_eq_in_list_negated() {
assert_eq!(output.result_rows, 19, "{}", output.description());
}

async fn test_prune_decimal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify what you mean by this comment?

I didn't quite understand if you planned to do this combination as part of this PR or if you planned to do it as a follow on PR.

In other words, is this PR ready for review, or do you plan more work?

@liukun4515
Copy link
Contributor Author

I will merge this pr first, and collation all the code of the prune test.

@liukun4515 liukun4515 merged commit 3d1de15 into apache:master Jul 30, 2022
@ursabot
Copy link

ursabot commented Jul 30, 2022

Benchmark runs are scheduled for baseline = 2d23860 and contender = 3d1de15. 3d1de15 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants