Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: refactor ParquetExec roundtrip tests #5260

Merged
merged 2 commits into from
Feb 13, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 12, 2023

Which issue does this PR close?

Part of #5104

Rationale for this change

I am trying to write a test for #5104 and getting lost in the maze of parameters passed to round_trip and round_trip_to_parquet

It is also hard for me to evaluate which combinations of tests are covered

What changes are included in this PR?

Move the parameters to run the roundtrip test into a structure, with better documentation

Are these changes tested?

Yes

Are there any user-facing changes?

no

@alamb alamb added the development-process Related to development process of DataFusion label Feb 12, 2023
@github-actions github-actions bot added core Core DataFusion crate and removed development-process Related to development process of DataFusion labels Feb 12, 2023
@alamb alamb marked this pull request as ready for review February 12, 2023 11:51
@jackwener jackwener self-requested a review February 12, 2023 13:01
}

/// run the test, returning the `RoundTripResult`
async fn round_trip(self, batches: Vec<RecordBatch>) -> RoundTripResult {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whitespace blind diff shows the changes more clearly: https://github.com/apache/arrow-datafusion/pull/5260/files?w=1


// If testing with page_index_predicate, write parquet
// files with multiple pages
let multi_page = page_index_predicate;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this logic slightly confusing so I added some comments and tried to make it clearer -- no functional change is intended

@@ -1121,8 +1158,12 @@ mod tests {
let filter = col("c2").eq(lit(2_i64)).or(col("c2").eq(lit(1_i64)));

// read/write them files:
let rt =
round_trip(vec![batch1, batch2], None, None, Some(filter), true, false).await;
let rt = RoundTrip::new()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this construction is clearer about what combination of options are being tested

@alamb alamb merged commit 9710d7c into apache:master Feb 13, 2023
@alamb alamb deleted the alamb/parquet_tests_cleanup branch February 13, 2023 11:38
@alamb
Copy link
Contributor Author

alamb commented Feb 13, 2023

Oops I hit merge with a fmt issue. Will fix

@ursabot
Copy link

ursabot commented Feb 13, 2023

Benchmark runs are scheduled for baseline = 00e60f1 and contender = 9710d7c. 9710d7c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb mentioned this pull request Feb 13, 2023
@alamb
Copy link
Contributor Author

alamb commented Feb 13, 2023

fmt fix in #5263

jiangzhx pushed a commit to jiangzhx/arrow-datafusion that referenced this pull request Feb 24, 2023
* Minor: refactor ParquetExec roundtrip tests

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <[email protected]>

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants