Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38318][SQL] Skip view cyclic reference check if view is stored as logical plan #35653

Closed
wants to merge 7 commits into from

Conversation

linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Feb 24, 2022

What changes were proposed in this pull request?

In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both
of them with View. This causes a regression that below case works in 3.1 but failed in 3.2

sql("select 1").createOrReplaceTempView("v")
sql("select * from v").createOrReplaceTempView("v")
-- in 3.1 it works well, and select will output 1
-- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)"

The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they
are wrapped by SubqueryAlias instead of View
In this PR, we want to skip the cyclic check if the view is stored as a logical plan.
i.e., storeAnalyzedPlanForView = true or view is created by Dataset API.

Why are the changes needed?

fix regression

Does this PR introduce any user-facing change?

No

How was this patch tested?

newly added ut

@github-actions github-actions bot added the SQL label Feb 24, 2022
@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan

@@ -614,7 +614,11 @@ object ViewHelper extends SQLConfHelper with Logging {
}.getOrElse(false)
if (replace && uncache) {
logDebug(s"Try to uncache ${name.quotedString} before replacing.")
checkCyclicViewReference(analyzedPlan, Seq(name), name)
if (!conf.storeAnalyzedPlanForView) {
Copy link
Contributor

@cloud-fan cloud-fan Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about !conf.storeAnalyzedPlanForView && originalText.nonEmpty? It also matches the if condition below where we decide to store analyzed plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just want to be conservative in the first version. Because !conf.storeAnalyzedPlanForView && originalText.nonEmpty means dataset view will always skip cyclic check. It's consistent with 3.1 which is good. But I thought we should always block recursive views unless the user explicitly changes the flag (even if the recursive views work well).

@linhongliu-db linhongliu-db changed the title [SPARK-38318][SQL] Skip view cyclic reference check when storeAnalyzedPlan = true [SPARK-38318][SQL] Skip view cyclic reference check if view is stored as logical plan Feb 28, 2022
@@ -443,6 +468,9 @@ class LocalTempViewTestSuite extends TempViewTestSuite with SharedSparkSession {
override protected def tableIdentifier(viewName: String): TableIdentifier = {
TableIdentifier(viewName)
}
override def createDatasetView(df: DataFrame, viewName: String): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use createOrReplaceDatasetView instead of createDatasetView.

@@ -614,7 +614,11 @@ object ViewHelper extends SQLConfHelper with Logging {
}.getOrElse(false)
if (replace && uncache) {
logDebug(s"Try to uncache ${name.quotedString} before replacing.")
checkCyclicViewReference(analyzedPlan, Seq(name), name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe removing this check and removing the SubqueryAlias and View from analyzedPlan is a better idea for this condition. Same with here.
However, it depends on whether we support user to repalce view in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support it officially. It's for backward compatibility only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this. If we remove the cyclic check, how to detect a recursive view created by SQL.

Copy link
Contributor

@jackylee-ch jackylee-ch Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this. If we remove the cyclic check, how to detect a recursive view created by SQL.

Oh, I got it, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support it officially. It's for backward compatibility only.

If we won't suport this, then why we change it for backward compatibility? It is not a valid behavior. What about leading people to use different name?

Copy link
Contributor Author

@linhongliu-db linhongliu-db Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stczwd, the community did a lot of work in view implementation in 3.1 and 3.2. The view's behavior makes more sense now. But it's also a little bit more confusing now because we have several kinds of view implementations now.

  1. Permanent view created by SQL DDL
  2. Temp view created by SQL DDL (from 3.1, this view now is represented with SQL text)
  3. Temp view created by SQL DDL with storeAnalyzedPlanForView = true (behavior before 3.1, same with view 4)
  4. Temp view created by Dataset API (until now, this view is still represented with analyzed logical plan)

To answer your question

If we won't suport this, then why we change it for backward compatibility?

We don't support recursive view for view created by SQL DDL, but we DO support this for view created by dataset API.
That's why we add this condition here: !conf.storeAnalyzedPlanForView && originalText.nonEmpty.
It means only check the cyclic reference for view 1 and 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support recursive view for view created by SQL DDL, but we DO support this for view created by dataset API.

Ah, I see, thanks for the explanation.

@@ -614,7 +614,11 @@ object ViewHelper extends SQLConfHelper with Logging {
}.getOrElse(false)
if (replace && uncache) {
logDebug(s"Try to uncache ${name.quotedString} before replacing.")
checkCyclicViewReference(analyzedPlan, Seq(name), name)
if (!conf.storeAnalyzedPlanForView && originalText.nonEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's better to have a val storeAnalyzedPlanForView = conf.storeAnalyzedPlanForView || originalText.nonEmpty at the beginning, then we can just write if (!storeAnalyzedPlanForView) here and below.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in 1d068ce Mar 1, 2022
cloud-fan pushed a commit that referenced this pull request Mar 1, 2022
… as logical plan

In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both
of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2
```sql
sql("select 1").createOrReplaceTempView("v")
sql("select * from v").createOrReplaceTempView("v")
-- in 3.1 it works well, and select will output 1
-- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)"
```
The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they
are wrapped by `SubqueryAlias` instead of `View`
In this PR, we want to skip the cyclic check if the view is stored as a logical plan.
i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API.

fix regression

No

newly added ut

Closes #35653 from linhongliu-db/SPARK-38318.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 1d068ce)
Signed-off-by: Wenchen Fan <[email protected]>
@ydh1000
Copy link

ydh1000 commented Apr 14, 2022

thanks for fixing it. do you have a new build?

kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
… as logical plan

In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both
of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2
```sql
sql("select 1").createOrReplaceTempView("v")
sql("select * from v").createOrReplaceTempView("v")
-- in 3.1 it works well, and select will output 1
-- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)"
```
The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they
are wrapped by `SubqueryAlias` instead of `View`
In this PR, we want to skip the cyclic check if the view is stored as a logical plan.
i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API.

fix regression

No

newly added ut

Closes apache#35653 from linhongliu-db/SPARK-38318.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 1d068ce)
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit fb1218d)
Signed-off-by: Dongjoon Hyun <[email protected]>
xza-m pushed a commit to xza-m/spark_ishumei that referenced this pull request Feb 26, 2023
… as logical plan

In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both
of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2
```sql
sql("select 1").createOrReplaceTempView("v")
sql("select * from v").createOrReplaceTempView("v")
-- in 3.1 it works well, and select will output 1
-- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)"
```
The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they
are wrapped by `SubqueryAlias` instead of `View`
In this PR, we want to skip the cyclic check if the view is stored as a logical plan.
i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API.

fix regression

No

newly added ut

Closes apache#35653 from linhongliu-db/SPARK-38318.

Authored-by: Linhong Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 1d068ce)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants