test: add file/SQL level test for pruning parquet row group with decimal data type. #2977

liukun4515 · 2022-07-27T14:48:20Z

Which issue does this PR close?

This PR just add more test cases for pruning row group with the decimal data type in parquet file.

part of #2962

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

…mal data type.

…quet_file

codecov-commenter · 2022-07-28T05:50:16Z

Codecov Report

Merging #2977 (095fb62) into master (176f432) will increase coverage by 0.02%.
The diff coverage is 98.46%.

@@            Coverage Diff             @@
##           master    #2977      +/-   ##
==========================================
+ Coverage   85.75%   85.78%   +0.02%     
==========================================
  Files         281      281              
  Lines       51494    51559      +65     
==========================================
+ Hits        44161    44228      +67     
+ Misses       7333     7331       -2

Impacted Files	Coverage Δ
datafusion/core/tests/parquet_pruning.rs	`98.29% <98.46%> (+0.03%)`	⬆️
datafusion/expr/src/logical_plan/plan.rs	`77.95% <0.00%> (+0.52%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

mingmwang · 2022-07-28T07:17:16Z

Can you please also add test cases to cover the decimal type with large precisions? For example decimal(18 ,0), decimal (38, 0). Maybe the UT will fail but once the getStats methods is fixed for binary array encoded decimal, the UT should pass.

liukun4515 · 2022-07-28T07:23:44Z

Can you please also add test cases to cover the decimal type with large precisions? For example decimal(18 ,0), decimal (38, 0). Maybe the UT will fail but once the getStats methods is fixed for binary array encoded decimal, the UT should pass.

thanks for your suggestions, and will add test for larger precision of decimal which will be stored as fixed_length_byte_array in the parquet.

liukun4515 · 2022-07-28T10:57:24Z

add more test case for other precision.

…ixed_length_byte_array in parquet

liukun4515 · 2022-07-28T11:22:45Z

datafusion/core/tests/parquet_pruning.rs

@@ -449,6 +450,154 @@ async fn prune_int32_eq_in_list_negated() {
    assert_eq!(output.result_rows, 19, "{}", output.description());
 }

+async fn test_prune_decimal(


I'm inspired by this, I will collation other test case and make the test cases cleanly

Can you please clarify what you mean by this comment?

I didn't quite understand if you planned to do this combination as part of this PR or if you planned to do it as a follow on PR.

In other words, is this PR ready for review, or do you plan more work?

Sorry for the confused comments.
When I work on the decimal test, and find all the pruning test has the common logic.
I can collation them in the next pr.

async test_prune( case_type: Scenario, sql: &str, expected_errors: Option<usize>, expected_row_group_pruned: Option<usize>, expected_results: usize) -> {}

All the test cases such as prune_f64_lt,prune_i64_lt can use test_prune to replace the repeated logic.

follow up pr: #2986

alamb · 2022-07-29T12:07:36Z

datafusion/core/tests/parquet_pruning.rs

@@ -449,6 +450,154 @@ async fn prune_int32_eq_in_list_negated() {
    assert_eq!(output.result_rows, 19, "{}", output.description());
 }

+async fn test_prune_decimal(


Can you please clarify what you mean by this comment?

I didn't quite understand if you planned to do this combination as part of this PR or if you planned to do it as a follow on PR.

In other words, is this PR ready for review, or do you plan more work?

liukun4515 · 2022-07-30T02:17:55Z

I will merge this pr first, and collation all the code of the prune test.

ursabot · 2022-07-30T02:22:12Z

Benchmark runs are scheduled for baseline = 2d23860 and contender = 3d1de15. 3d1de15 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

test: add file/SQL level test for pruning parquet row group with deci…

3a529ba

…mal data type.

github-actions bot added the core Core DataFusion crate label Jul 27, 2022

Merge remote-tracking branch 'upstream/master' into prune_decimal_par…

4ac70a2

…quet_file

liukun4515 marked this pull request as ready for review July 28, 2022 02:55

liukun4515 changed the title ~~[WIP] test: add file/SQL level test for pruning parquet row group with decimal data type.~~ test: add file/SQL level test for pruning parquet row group with decimal data type. Jul 28, 2022

liukun4515 requested review from alamb, andygrove and tustvold July 28, 2022 03:16

address issue caused by upgrade the arrow-rs

2f88811

liukun4515 force-pushed the prune_decimal_parquet_file branch from 8eef9f3 to 2f88811 Compare July 28, 2022 05:14

add test case: decimal with large precision which will be stored as f…

0f8f7e2

…ixed_length_byte_array in parquet

liukun4515 force-pushed the prune_decimal_parquet_file branch from 095fb62 to 0f8f7e2 Compare July 28, 2022 11:16

liukun4515 commented Jul 28, 2022

View reviewed changes

alamb approved these changes Jul 29, 2022

View reviewed changes

liukun4515 merged commit 3d1de15 into apache:master Jul 30, 2022

liukun4515 mentioned this pull request Jul 30, 2022

minor: collation the prune test #2986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add file/SQL level test for pruning parquet row group with decimal data type. #2977

test: add file/SQL level test for pruning parquet row group with decimal data type. #2977

liukun4515 commented Jul 27, 2022 •

edited

Loading

codecov-commenter commented Jul 28, 2022 •

edited

Loading

mingmwang commented Jul 28, 2022

liukun4515 commented Jul 28, 2022

liukun4515 commented Jul 28, 2022

liukun4515 Jul 28, 2022

alamb Jul 29, 2022

liukun4515 Jul 30, 2022

liukun4515 Jul 30, 2022

alamb Jul 29, 2022

liukun4515 commented Jul 30, 2022

ursabot commented Jul 30, 2022

test: add file/SQL level test for pruning parquet row group with decimal data type. #2977

test: add file/SQL level test for pruning parquet row group with decimal data type. #2977

Conversation

liukun4515 commented Jul 27, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter commented Jul 28, 2022 • edited Loading

Codecov Report

mingmwang commented Jul 28, 2022

liukun4515 commented Jul 28, 2022

liukun4515 commented Jul 28, 2022

liukun4515 Jul 28, 2022

Choose a reason for hiding this comment

alamb Jul 29, 2022

Choose a reason for hiding this comment

liukun4515 Jul 30, 2022

Choose a reason for hiding this comment

liukun4515 Jul 30, 2022

Choose a reason for hiding this comment

alamb Jul 29, 2022

Choose a reason for hiding this comment

liukun4515 commented Jul 30, 2022

ursabot commented Jul 30, 2022

liukun4515 commented Jul 27, 2022 •

edited

Loading

codecov-commenter commented Jul 28, 2022 •

edited

Loading