-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: simple common table expressions #20374
Conversation
Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. docs/RFCS/00000000_simple_common_table_expressions.md, line 15 at r1 (raw file):
with partial compatibility docs/RFCS/00000000_simple_common_table_expressions.md, line 17 at r1 (raw file):
The summary must outline what is in-scope and out-of-scope. Out-of-scope:
Also for each of them outline why they are out of scope:
docs/RFCS/00000000_simple_common_table_expressions.md, line 26 at r1 (raw file):
I recommend you remind the reader somewhere where "at most once" is coming from. The true definition is "CTEs should be referentially transparent" no matter how you implement them. For some SQL expressions, referential transparency can be obtained even with multiple evaluations, however with a naive execution engine, in the general case, we can only guarantee referential transparency by ensuring the evaluation occurs at most once, because SQL allows non-deterministic queries and unless we can guarantee a query is deterministic each evaluation can produce different results. Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on putting this together. I love the spontaneity of this proposal! Very well done!
|
||
Because of this proposal's restrictions, temporary table infrastructure is not | ||
necessary as each CTE clause will stream its output to the plan that references | ||
it just like an ordinary CockroachDB plan tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious whether if we used external storage here if the streamed data gets too big whether this proposal will not have restrictions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"use external storage here" == implement temp tables
This risk can be mitigated by setting expectations carefully in the docs and | ||
feature release notes. As long as we don't claim to have full CTE support, | ||
people won't be unduly surprised when they can't use some of the more complex | ||
functionality that CTEs offer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can call it Simple CTE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on "Single-use CTE"
Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions. docs/RFCS/00000000_simple_common_table_expressions.md, line 15 at r1 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/00000000_simple_common_table_expressions.md, line 17 at r1 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/00000000_simple_common_table_expressions.md, line 26 at r1 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/00000000_simple_common_table_expressions.md, line 246 at r1 (raw file): Previously, knz (kena) wrote…
Yeah, what @knz said. You're right, but that effort is not small and is correspondingly out of scope for now. docs/RFCS/00000000_simple_common_table_expressions.md, line 262 at r1 (raw file): Previously, knz (kena) wrote…
Done. Comments from Reviewable |
Nice RFC! Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed. docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file):
While it is true a temporary table could be used for the intermediate results, we don't need the full complexity of @knz, @RaduBerinde Am I misunderstanding the complexity here? That said, I think it is appropriate to leave this to future work, but let's not overstate the difficulty involved. Comments from Reviewable |
Reviewed 1 of 1 files at r2. docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
You're partially right but I would be wary to form an opinion until/unless someone goes through the motion of spelling out (in a separate RFC) how it would work. Also as long as we have two execution engines and the one that supports disk-backed temp storage cannot execute all queries, it would be unwise to try and support CTEs in this way, because it would yield poor UX: some CTEs would "work" and others not for reasons unscrutable. Jordan maybe you can capture this discussion in a "future work" section. docs/RFCS/00000000_simple_common_table_expressions.md, line 1 at r2 (raw file):
Update the title. docs/RFCS/00000000_simple_common_table_expressions.md, line 37 at r2 (raw file):
"or ..." missing after "either ... " Comments from Reviewable |
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file): Previously, knz (kena) wrote…
Btw, in SQL Server and Orca, this temporary storage operator is called the spool operator. See http://blogs.lobsterpot.com.au/2013/06/11/spooling-in-sql-execution-plans/ for one description. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions. docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Added a section about this to future work. docs/RFCS/00000000_simple_common_table_expressions.md, line 1 at r2 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/00000000_simple_common_table_expressions.md, line 37 at r2 (raw file): Previously, knz (kena) wrote…
Done. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions. docs/RFCS/20171206_single_use_common_table_expressions.md, line 304 at r3 (raw file):
Is the use of temporary storage automatic? I thought processors had to utilize it explicitly. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/20171206_single_use_common_table_expressions.md, line 75 at r3 (raw file):
might be worth referencing the ORMs that need this docs/RFCS/20171206_single_use_common_table_expressions.md, line 279 at r3 (raw file):
Could this be flagged or gated behind beta support? I too am concerned with the user education component (but would love a solution to unlock this for others who request it) docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file):
Useful to discuss this in the context of other planned SQL projects like the optimizer Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/20171206_single_use_common_table_expressions.md, line 75 at r3 (raw file): Previously, awoods187 (Andy Woods) wrote…
Done. docs/RFCS/20171206_single_use_common_table_expressions.md, line 279 at r3 (raw file): Previously, awoods187 (Andy Woods) wrote…
I don't think it's practical (or necessary) to gate this as a beta feature. The behavior of the queries we support will not differ from their correct behaviors. Therefore, adding single-use CTEs only should be no more surprising to our users than our lack of CTEs entirely today. docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file): Previously, awoods187 (Andy Woods) wrote…
This feature is pretty much orthogonal to the optimizer, as far as I can tell, so I'm going to leave that out. docs/RFCS/20171206_single_use_common_table_expressions.md, line 304 at r3 (raw file): Previously, petermattis (Peter Mattis) wrote…
You're right. Updated. Comments from Reviewable |
This RFC is entering the final comment period. Targeting a merge on Wednesday. Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions. Comments from Reviewable |
LGTM |
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending. docs/RFCS/20171206_single_use_common_table_expressions.md, line 279 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
alright i'm convinced. We solve this via user education in docs and/or blog posts docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
got it--i think i misunderstood that the work done for the optimizer would also make non-recursive CTEs easy Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending. docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file): Previously, awoods187 (Andy Woods) wrote…
For multi-use CTEs (which require materialization), the query optimizer will need to add "spool" processors in order to enforce a "rewindability" property for operators which don't natively support "rewindability". This is similar to the way the optimizer will add a "sort" processor to enforce ordering. Rather than implementing CTEs now, we could wait for the new optimizer. It is possible that some of the work down now will need to be reimplemented. But given the limited scope here, I'm not terribly worried about that. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/20171206_single_use_common_table_expressions.md, line 53 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
That sounds right to me. The processor you are describing will be necessary for implementing nested loop joins as well. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/20171206_single_use_common_table_expressions.md, line 53 at r1 (raw file): Previously, RaduBerinde wrote…
Yep, @knz mentioned that today as well. Note that table scan and index scan operators support "rewindability", so a spool would only be necessary for a nested loop join where the right hand side is not a scan. Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/20171206_single_use_common_table_expressions.md, line 53 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
The operators might in theory, but the processors don't :) I don't have a clear image of how a nested loop join will work with distsql anyway; it seems like we need some kind of dynamic planning. For CTEs we know upfront how many outputs we have, they can all be pre-configured. Comments from Reviewable |
Crazy idea: in the local engine, what if we defined some It could be something as simple type withNode struct {
plan planNode
numRefs int
outputBuffer []tree.Datums
curIdx int
recentNext bool
}
func with(existing *withNode, ...) *withNode {
if existing != nil {
existing.numRefs++
return existing
}
}
func (w *withNode) Start() {
w.outputBuffer := make([]tree.Datums, w.numRefs)
w.curIdx = -1
}
func (w *withNode) Next() bool {
w.curIdx = (w.curIdx + 1) % w.numRefs
if w.curIdx == 0 {
w.recentNext = w.planNext()
}
return w.recentNext
}
func (w *withNode) Values() tree.Datums {
if w.curIdx == 0 {
for i := range w.outputBuffer {
copy(w.outputBuffer[i], w.plan.Values())
}
}
return w.outputBuffer[w.curIdx]
}
func (w *withNode) Close() {
w.numRefs--
if w.numRefs == 0 {
w.plan.Close()
}
} There's probably an off by 1 error with |
I don't see how that would work. If multiple nodes call For DistSQL, a mirror router works but the router does not yet spill the buffer to disk (it's a TODO). |
👏 Reviewed 1 of 2 files at r3, 1 of 1 files at r4. docs/RFCS/20171206_single_use_common_table_expressions.md, line 39 at r4 (raw file):
s/increase increase/increase/ docs/RFCS/20171206_single_use_common_table_expressions.md, line 306 at r4 (raw file):
s/processrs/processors Comments from Reviewable |
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending. docs/RFCS/20171206_single_use_common_table_expressions.md, line 39 at r4 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Done. docs/RFCS/20171206_single_use_common_table_expressions.md, line 306 at r4 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Done. Comments from Reviewable |
This RFC is a proposal to implement the restricted subset of common table expressions that doesn't require temporary table infrastructure. A CTE will be supported if each of its named clauses isn't referenced more than once as a data source. Release note: None
TFTRs! |
This RFC is a proposal to implement the restricted subset of common
table expressions that doesn't require temporary table infrastructure.
A CTE will be supported if each of its named clauses isn't referenced
more than once as a data source.
References #7029.
Created in light of discussion in #20359.
cc @BramGruneir - thanks for suggesting to write this!