Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32822][SQL] Change the number of partitions to zero when a range is empty with WholeStageCodegen disabled or falled back #29681

Closed
wants to merge 3 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Sep 8, 2020

What changes were proposed in this pull request?

This PR changes the behavior of RangeExec with WholeStageCodegen disabled or falled back to change the number of partitions to zero when a range is empty.

In the current master, if WholeStageCodegen effects, the number of partitions of an empty range will be changed to zero.

spark.range(1, 1, 1, 1000).rdd.getNumPartitions
res0: Int = 0

But it doesn't if WholeStageCodegen is disabled or falled back.

spark.conf.set("spark.sql.codegen.wholeStage", false)
spark.range(1, 1, 1, 1000).rdd.getNumPartitions
res2: Int = 1000 

Why are the changes needed?

To archive better performance even though WholeStageCodegen disabled or falled back.

Does this PR introduce any user-facing change?

Yes. the number of partitions gotten with getNumPartitions for an empty range will be changed when WholeStageCodegen is disabled.

How was this patch tested?

New test.

overflow = true
if (isEmptyRange) {
new EmptyRDD[InternalRow](sqlContext.sparkContext)
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: There are no changes within this else block.
The actual change is if block above.

@@ -994,6 +994,14 @@ class PlannerSuite extends SharedSparkSession with AdaptiveSparkPlanHelper {
}
}
}

test("Change the number of partitions to zero when a range is empty") {
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. nit: Just in case, could you test both cases with/without whole-stage codegen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right.

@@ -395,8 +396,10 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
RangeExec(range.canonicalized.asInstanceOf[org.apache.spark.sql.catalyst.plans.logical.Range])
}



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert the unnecesary changes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the minor comments.

@SparkQA
Copy link

SparkQA commented Sep 8, 2020

Test build #128403 has finished for PR 29681 at commit fdc858e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2020

Test build #128406 has finished for PR 29681 at commit c506803.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks good to me.

@@ -994,6 +994,20 @@ class PlannerSuite extends SharedSparkSession with AdaptiveSparkPlanHelper {
}
}
}

test("Change the number of partitions to zero when a range is empty") {
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use testWithWholeStageCodegenOnAndOff instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't notice I can use it. Thanks.

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128510 has finished for PR 29681 at commit c3f2f67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 5f468cc Sep 11, 2020
@maropu
Copy link
Member

maropu commented Sep 11, 2020

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants