Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder the physical plan optimizer rules, extract GlobalSortSelection, make Repartition optional #4714

Merged
merged 8 commits into from
Dec 27, 2022

Conversation

yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Dec 23, 2022

Which issue does this PR close?

Closes #4678.

Rationale for this change

According to #4678, this change is done by the following steps:

  1. Extract the global sort algorithm selection from the BasicEnforcement to be a separate rule, GlobalSortSelection.
  2. Make the Repartition optional.
  3. Reorder the rules as following:
    • AggregateStatistics
    • Repartition(optional)
    • GlobalSortSelection
    • JoinSelection
    • BasicEnforcement
    • CoalesceBatches(optional)

The reason for this ordering is as follows:

  • For Repartition, in order to increase the parallelism, it will change the output partitioning of some operators in the plan tree, which will influence other rules. Therefore, it should be run as soon as possible. The reason to make it optional is it's not used for the distributed engine, Ballista. And it's conflicted with some parts of the BasicEnforcement, since it will introduce additional repartitioning while the BasicEnforcement aims at reducing unnecessary repartitioning.
  • For GlobalSortSelection, since currently it will depend on the partition number to decide whether change the single node sort to parallel local sort and merge, it should be run after the Repartition. Since it will change the output ordering of some operators, it should be run before JoinSelection and BasicEnforcement, which may depend on that.
  • For JoinSelection, based on statistics, it will change the Auto mode to real join implementation, like collect left, or hash join, or future sort merge join, which will influence the BasicEnforcement to decide whether to add additional repartition and local sort to meet the distribution and ordering requirements. Therefore, it should be run before BasicEnforcement.
  • For BasicEnforcement, before run this rule, please make sure that the whole plan tree is determined.
  • For CoalesceBatches, it will not influence the distribution and ordering of the whole plan tree. Therefore, to avoid influencing other rules, it should be run at last.

What changes are included in this PR?

  1. The global sort algorithm selection is extracted from the BasicEnforcement to be a separate rule, GlobalSortSelection.
  2. Don't insert CoalesceBatches after the Repartition.
  3. The data of EmptyExec is aligned with the field number of its schema.
  4. Repartition only considers the ordering influence for the final output and will not consider the intermediate data ordering especially the local ordering which should be dealt with by BasicEnforcement.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 23, 2022
@yahoNanJing
Copy link
Contributor Author

Hi @alamb, @Dandandan, @mingmwang, could you help review this PR?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yahoNanJing -- the basic idea in this PR looks great to me. 👍

I am concerned about two things:

  1. Adding repartition before sort preserving merge -- I think we can handle this by updating the test to use all the optimizer rules
  2. Potential conflicts (logical) with Unnecessary SortExec removal rule from Physical Plan #4691 from @mustafasrepo as they both potentially change Sorts in the physical plan.

The only thing I think is required prior to merge, however, is item 1.

@@ -133,6 +133,10 @@ pub const OPT_PREFER_HASH_JOIN: &str = "datafusion.optimizer.prefer_hash_join";
pub const OPT_HASH_JOIN_SINGLE_PARTITION_THRESHOLD: &str =
"datafusion.optimizer.hash_join_single_partition_threshold";

/// Configuration option "datafusion.execution.round_robin_repartition"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -57,7 +57,16 @@ impl PhysicalOptimizerRule for CoalesceBatches {
// See https://github.com/apache/arrow-datafusion/issues/139
let wrap_in_coalesce = plan_any.downcast_ref::<FilterExec>().is_some()
|| plan_any.downcast_ref::<HashJoinExec>().is_some()
|| plan_any.downcast_ref::<RepartitionExec>().is_some();
// Don't need to add CoalesceBatchesExec after a round robin RepartitionExec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense to me -- to confirm the rationale for this change is that RoundRobinRepartition simply shuffles batches around to different partitions, but doesn't actually change (grow or shrink) the actual RecordBatches

/// - and there's some limit which can be pushed down to each of its input partitions
/// then [SortPreservingMergeExec] with local sort with a limit pushed down will be preferred;
/// Otherwise, the normal global sort [SortExec] will be used.
/// Later more intelligent statistics-based decision can also be introduced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #4691

@@ -161,34 +161,14 @@ fn optimize_partitions(
// leaf node - don't replace children
plan
} else {
let can_reorder_children =
match (plan.relies_on_input_order(), plan.maintains_input_order()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the last use of maintains_input_order -- so after this PR is merged, I will try and create a PR that removes this from the trait -- it is redundant with the plan.output_order I think

@@ -2201,8 +2201,8 @@ async fn right_semi_join() -> Result<()> {
"SortExec: [t1_id@0 ASC NULLS LAST]",
" CoalescePartitionsExec",
" ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_int@2 as t1_int]",
" CoalesceBatchesExec: target_batch_size=4096",
" RepartitionExec: partitioning=RoundRobinBatch(2)",
" RepartitionExec: partitioning=RoundRobinBatch(2)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes seem ok to me (no better or worse)

let ctx = SessionContext::new();
// We need to specify the target partition number.
// Otherwise, the default value used may vary on different environment
// with different cpu core number, which may cause the UT failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1649,9 +1652,10 @@ async fn test_window_agg_sort() -> Result<()> {
let expected = {
vec![
"ProjectionExec: expr=[c9@3 as c9, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@0 as sum1, SUM(aggregate_test_100.c9) ORDER BY [aggregate_test_100.c9 ASC NULLS LAST, aggregate_test_100.c8 ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@1 as sum2]",
" WindowAggExec: wdw=[SUM(aggregate_test_100.c9): Ok(Field { name: \"SUM(aggregate_test_100.c9)\", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(UInt32(NULL)), end_bound: CurrentRow }]",
" RepartitionExec: partitioning=RoundRobinBatch(2)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Repartition's don't seem particularly useful to me (they repartition right before a projection) but I also don't think they hurt either

@alamb alamb changed the title Reorder the physical plan optimizer rules Reorder the physical plan optimizer rules, extract GlobalSortSelection, make Repartition optional Dec 26, 2022
@alamb
Copy link
Contributor

alamb commented Dec 26, 2022

The CI check https://github.com/apache/arrow-datafusion/actions/runs/3764447841/jobs/6399000363 appears to be failing:

+----+-----------------------------+-----------------------------+
   Compiling datafusion-examples v15.0.0 (/__w/arrow-datafusion/arrow-datafusion/datafusion-examples)
    Finished dev [unoptimized + debuginfo] target(s) in 22.48s
     Running `target/debug/examples/custom_datasource`
thread 'main' panicked at 'not yet implemented', datafusion-examples/examples/custom_datasource.rs:267:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@ozankabak
Copy link
Contributor

@alamb, I don't think there is any logical conflict with #4691. I remember @mustafasrepo and I discussing this PR from that angle before and nothing came up. So unless something changed in the meantime, everybody should be OK.

@yahoNanJing
Copy link
Contributor Author

Thanks @alamb, @Dandandan and @ozankabak. The UT and conflicts are refined. Could you help have a check again?

@@ -120,6 +120,7 @@ datafusion.execution.target_partitions 7
datafusion.execution.time_zone +00:00
datafusion.explain.logical_plan_only false
datafusion.explain.physical_plan_only false
datafusion.optimizer.enable_round_robin_repartition true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 899c86a into apache:master Dec 27, 2022
@ursabot
Copy link

ursabot commented Dec 27, 2022

Benchmark runs are scheduled for baseline = 981a9bb and contender = 899c86a. 899c86a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reorder the physical plan optimizer rules
6 participants