-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38318][SQL] Skip view cyclic reference check if view is stored as logical plan #35653
Conversation
cc @cloud-fan |
@@ -614,7 +614,11 @@ object ViewHelper extends SQLConfHelper with Logging { | |||
}.getOrElse(false) | |||
if (replace && uncache) { | |||
logDebug(s"Try to uncache ${name.quotedString} before replacing.") | |||
checkCyclicViewReference(analyzedPlan, Seq(name), name) | |||
if (!conf.storeAnalyzedPlanForView) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about !conf.storeAnalyzedPlanForView && originalText.nonEmpty
? It also matches the if condition below where we decide to store analyzed plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I just want to be conservative in the first version. Because !conf.storeAnalyzedPlanForView && originalText.nonEmpty
means dataset view will always skip cyclic check. It's consistent with 3.1 which is good. But I thought we should always block recursive views unless the user explicitly changes the flag (even if the recursive views work well).
@@ -443,6 +468,9 @@ class LocalTempViewTestSuite extends TempViewTestSuite with SharedSparkSession { | |||
override protected def tableIdentifier(viewName: String): TableIdentifier = { | |||
TableIdentifier(viewName) | |||
} | |||
override def createDatasetView(df: DataFrame, viewName: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use createOrReplaceDatasetView
instead of createDatasetView
.
@@ -614,7 +614,11 @@ object ViewHelper extends SQLConfHelper with Logging { | |||
}.getOrElse(false) | |||
if (replace && uncache) { | |||
logDebug(s"Try to uncache ${name.quotedString} before replacing.") | |||
checkCyclicViewReference(analyzedPlan, Seq(name), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe removing this check and removing the SubqueryAlias and View from analyzedPlan is a better idea for this condition. Same with here.
However, it depends on whether we support user to repalce view in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to support it officially. It's for backward compatibility only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get this. If we remove the cyclic check, how to detect a recursive view created by SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get this. If we remove the cyclic check, how to detect a recursive view created by SQL.
Oh, I got it, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to support it officially. It's for backward compatibility only.
If we won't suport this, then why we change it for backward compatibility? It is not a valid behavior. What about leading people to use different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stczwd, the community did a lot of work in view implementation in 3.1 and 3.2. The view's behavior makes more sense now. But it's also a little bit more confusing now because we have several kinds of view implementations now.
- Permanent view created by SQL DDL
- Temp view created by SQL DDL (from 3.1, this view now is represented with SQL text)
- Temp view created by SQL DDL with storeAnalyzedPlanForView = true (behavior before 3.1, same with view 4)
- Temp view created by Dataset API (until now, this view is still represented with analyzed logical plan)
To answer your question
If we won't suport this, then why we change it for backward compatibility?
We don't support recursive view for view created by SQL DDL, but we DO support this for view created by dataset API.
That's why we add this condition here: !conf.storeAnalyzedPlanForView && originalText.nonEmpty
.
It means only check the cyclic reference for view 1 and 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support recursive view for view created by SQL DDL, but we DO support this for view created by dataset API.
Ah, I see, thanks for the explanation.
@@ -614,7 +614,11 @@ object ViewHelper extends SQLConfHelper with Logging { | |||
}.getOrElse(false) | |||
if (replace && uncache) { | |||
logDebug(s"Try to uncache ${name.quotedString} before replacing.") | |||
checkCyclicViewReference(analyzedPlan, Seq(name), name) | |||
if (!conf.storeAnalyzedPlanForView && originalText.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's better to have a val storeAnalyzedPlanForView = conf.storeAnalyzedPlanForView || originalText.nonEmpty
at the beginning, then we can just write if (!storeAnalyzedPlanForView)
here and below.
thanks, merging to master/3.2! |
… as logical plan In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2 ```sql sql("select 1").createOrReplaceTempView("v") sql("select * from v").createOrReplaceTempView("v") -- in 3.1 it works well, and select will output 1 -- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)" ``` The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they are wrapped by `SubqueryAlias` instead of `View` In this PR, we want to skip the cyclic check if the view is stored as a logical plan. i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API. fix regression No newly added ut Closes #35653 from linhongliu-db/SPARK-38318. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 1d068ce) Signed-off-by: Wenchen Fan <[email protected]>
thanks for fixing it. do you have a new build? |
… as logical plan In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2 ```sql sql("select 1").createOrReplaceTempView("v") sql("select * from v").createOrReplaceTempView("v") -- in 3.1 it works well, and select will output 1 -- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)" ``` The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they are wrapped by `SubqueryAlias` instead of `View` In this PR, we want to skip the cyclic check if the view is stored as a logical plan. i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API. fix regression No newly added ut Closes apache#35653 from linhongliu-db/SPARK-38318. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 1d068ce) Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit fb1218d) Signed-off-by: Dongjoon Hyun <[email protected]>
… as logical plan In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both of them with `View`. This causes a regression that below case works in 3.1 but failed in 3.2 ```sql sql("select 1").createOrReplaceTempView("v") sql("select * from v").createOrReplaceTempView("v") -- in 3.1 it works well, and select will output 1 -- in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)" ``` The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they are wrapped by `SubqueryAlias` instead of `View` In this PR, we want to skip the cyclic check if the view is stored as a logical plan. i.e., `storeAnalyzedPlanForView = true` or view is created by Dataset API. fix regression No newly added ut Closes apache#35653 from linhongliu-db/SPARK-38318. Authored-by: Linhong Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 1d068ce) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In 3.2, we unified the representation of dataset view and SQL view, i.e., we wrap both
of them with
View
. This causes a regression that below case works in 3.1 but failed in 3.2The root cause is in 3.1 we actually never did view cyclic check for dataset view. Because they
are wrapped by
SubqueryAlias
instead ofView
In this PR, we want to skip the cyclic check if the view is stored as a logical plan.
i.e.,
storeAnalyzedPlanForView = true
or view is created by Dataset API.Why are the changes needed?
fix regression
Does this PR introduce any user-facing change?
No
How was this patch tested?
newly added ut