Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfcs: simple common table expressions #20374

Merged
merged 1 commit into from
Dec 6, 2017

Conversation

jordanlewis
Copy link
Member

This RFC is a proposal to implement the restricted subset of common
table expressions that doesn't require temporary table infrastructure.
A CTE will be supported if each of its named clauses isn't referenced
more than once as a data source.

References #7029.
Created in light of discussion in #20359.
cc @BramGruneir - thanks for suggesting to write this!

@jordanlewis jordanlewis requested a review from a team as a code owner December 1, 2017 05:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Dec 1, 2017

Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


docs/RFCS/00000000_simple_common_table_expressions.md, line 15 at r1 (raw file):

infrastructure, which is currently missing from CockroachDB. This proposal aims
to fast-track the implementation of the subset of CTEs that doesn't require
temporary tables, providing our users with compatibility and query readability

with partial compatibility


docs/RFCS/00000000_simple_common_table_expressions.md, line 17 at r1 (raw file):

temporary tables, providing our users with compatibility and query readability
boons at little cost.

The summary must outline what is in-scope and out-of-scope.

Out-of-scope:

  • recursive CTEs
  • correlated CTEs
  • using a CTE name more than once, either in other CTE definitions or the surrounding query.

Also for each of them outline why they are out of scope:

  • recursive CTEs: because that would require an entirely new execution engine
  • correlated CTEs: because we don't support correlation yet, and CTEs don't make (de)correlation any easier - you can ref the correlated SQ issue here
  • using a CTE name more than once - this needs temporary tables AND/OR an optimization predicate that can exclude non-deterministic CTEs

docs/RFCS/00000000_simple_common_table_expressions.md, line 26 at r1 (raw file):

regardless of how many times they're referenced later in the statement. They
increase the expressivity of SQL by adding some syntactic sugar and new access
patterns:

I recommend you remind the reader somewhere where "at most once" is coming from.
I personally would argue that the specific wording "must be evaluated at most once" is putting unnecessary emphasis on the mechanism.

The true definition is "CTEs should be referentially transparent" no matter how you implement them. For some SQL expressions, referential transparency can be obtained even with multiple evaluations, however with a naive execution engine, in the general case, we can only guarantee referential transparency by ensuring the evaluation occurs at most once, because SQL allows non-deterministic queries and unless we can guarantee a query is deterministic each evaluation can produce different results.


Comments from Reviewable

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on putting this together. I love the spontaneity of this proposal! Very well done!


Because of this proposal's restrictions, temporary table infrastructure is not
necessary as each CTE clause will stream its output to the plan that references
it just like an ordinary CockroachDB plan tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious whether if we used external storage here if the streamed data gets too big whether this proposal will not have restrictions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"use external storage here" == implement temp tables

This risk can be mitigated by setting expectations carefully in the docs and
feature release notes. As long as we don't claim to have full CTE support,
people won't be unduly surprised when they can't use some of the more complex
functionality that CTEs offer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can call it Simple CTE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on "Single-use CTE"

@jordanlewis
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions.


docs/RFCS/00000000_simple_common_table_expressions.md, line 15 at r1 (raw file):

Previously, knz (kena) wrote…

with partial compatibility

Done.


docs/RFCS/00000000_simple_common_table_expressions.md, line 17 at r1 (raw file):

Previously, knz (kena) wrote…

The summary must outline what is in-scope and out-of-scope.

Out-of-scope:

  • recursive CTEs
  • correlated CTEs
  • using a CTE name more than once, either in other CTE definitions or the surrounding query.

Also for each of them outline why they are out of scope:

  • recursive CTEs: because that would require an entirely new execution engine
  • correlated CTEs: because we don't support correlation yet, and CTEs don't make (de)correlation any easier - you can ref the correlated SQ issue here
  • using a CTE name more than once - this needs temporary tables AND/OR an optimization predicate that can exclude non-deterministic CTEs

Done.


docs/RFCS/00000000_simple_common_table_expressions.md, line 26 at r1 (raw file):

Previously, knz (kena) wrote…

I recommend you remind the reader somewhere where "at most once" is coming from.
I personally would argue that the specific wording "must be evaluated at most once" is putting unnecessary emphasis on the mechanism.

The true definition is "CTEs should be referentially transparent" no matter how you implement them. For some SQL expressions, referential transparency can be obtained even with multiple evaluations, however with a naive execution engine, in the general case, we can only guarantee referential transparency by ensuring the evaluation occurs at most once, because SQL allows non-deterministic queries and unless we can guarantee a query is deterministic each evaluation can produce different results.

Done.


docs/RFCS/00000000_simple_common_table_expressions.md, line 246 at r1 (raw file):

Previously, knz (kena) wrote…

"use external storage here" == implement temp tables

Yeah, what @knz said. You're right, but that effort is not small and is correspondingly out of scope for now.


docs/RFCS/00000000_simple_common_table_expressions.md, line 262 at r1 (raw file):

Previously, knz (kena) wrote…

👍 on "Single-use CTE"

Done.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Nice RFC!


Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file):

and 2 above. We believe that providing this subset of CTEs will be so
beneficial for our users that we shouldn't delay implementing the feature until
after temporary tables are available at some undetermined time in the future.

While it is true a temporary table could be used for the intermediate results, we don't need the full complexity of CREATE TEMPORARY TABLE. I think what we need is a new distsql processor which buffers the results that pass through it (to disk if the size is large enough) and allows multiple readers to get the full stream. if I squint, it seems most of the infrastructure for this is already present with the support for secondary storage for group-by and sorting. The "table" is write once, read many, which are much easier semantics than the full-transactional semantics needed for normal temporary tables. The temporary storage used by distsql processors is already cleaned up when the query terminates.

@knz, @RaduBerinde Am I misunderstanding the complexity here?

That said, I think it is appropriate to leave this to future work, but let's not overstate the difficulty involved.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Dec 3, 2017

Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

While it is true a temporary table could be used for the intermediate results, we don't need the full complexity of CREATE TEMPORARY TABLE. I think what we need is a new distsql processor which buffers the results that pass through it (to disk if the size is large enough) and allows multiple readers to get the full stream. if I squint, it seems most of the infrastructure for this is already present with the support for secondary storage for group-by and sorting. The "table" is write once, read many, which are much easier semantics than the full-transactional semantics needed for normal temporary tables. The temporary storage used by distsql processors is already cleaned up when the query terminates.

@knz, @RaduBerinde Am I misunderstanding the complexity here?

That said, I think it is appropriate to leave this to future work, but let's not overstate the difficulty involved.

You're partially right but I would be wary to form an opinion until/unless someone goes through the motion of spelling out (in a separate RFC) how it would work.

Also as long as we have two execution engines and the one that supports disk-backed temp storage cannot execute all queries, it would be unwise to try and support CTEs in this way, because it would yield poor UX: some CTEs would "work" and others not for reasons unscrutable.

Jordan maybe you can capture this discussion in a "future work" section.


docs/RFCS/00000000_simple_common_table_expressions.md, line 1 at r2 (raw file):

- Simple Common Table Expressions

Update the title.


docs/RFCS/00000000_simple_common_table_expressions.md, line 37 at r2 (raw file):

In full CTEs, the named statements can be referred to unlimited times
throughout the rest of the statement. To preserve the referential integrity of
the names of the statements, the execution engine must either ensure that each

"or ..." missing after "either ... "


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file):

Previously, knz (kena) wrote…

You're partially right but I would be wary to form an opinion until/unless someone goes through the motion of spelling out (in a separate RFC) how it would work.

Also as long as we have two execution engines and the one that supports disk-backed temp storage cannot execute all queries, it would be unwise to try and support CTEs in this way, because it would yield poor UX: some CTEs would "work" and others not for reasons unscrutable.

Jordan maybe you can capture this discussion in a "future work" section.

Btw, in SQL Server and Orca, this temporary storage operator is called the spool operator. See http://blogs.lobsterpot.com.au/2013/06/11/spooling-in-sql-execution-plans/ for one description.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions.


docs/RFCS/00000000_simple_common_table_expressions.md, line 53 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Btw, in SQL Server and Orca, this temporary storage operator is called the spool operator. See http://blogs.lobsterpot.com.au/2013/06/11/spooling-in-sql-execution-plans/ for one description.

Added a section about this to future work.


docs/RFCS/00000000_simple_common_table_expressions.md, line 1 at r2 (raw file):

Previously, knz (kena) wrote…

Update the title.

Done.


docs/RFCS/00000000_simple_common_table_expressions.md, line 37 at r2 (raw file):

Previously, knz (kena) wrote…

"or ..." missing after "either ... "

Done.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 304 at r3 (raw file):

Temporary storage may be less involved than fully implementing temporary
tables. DistSQL already automatically uses temporary storage between processors

Is the use of temporary storage automatic? I thought processors had to utilize it explicitly.


Comments from Reviewable

@awoods187
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 75 at r3 (raw file):

of this subset was straightforward and weighed in at less than 300 lines of new
non-test code.

might be worth referencing the ORMs that need this


docs/RFCS/20171206_single_use_common_table_expressions.md, line 279 at r3 (raw file):

people won't be unduly surprised when they can't use some of the more complex
functionality that CTEs offer.

Could this be flagged or gated behind beta support? I too am concerned with the user education component (but would love a solution to unlock this for others who request it)


docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file):

The impact of waiting to implement this functionality is that we might turn
away potential users that expect to be able to use CTEs.

Useful to discuss this in the context of other planned SQL projects like the optimizer


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 75 at r3 (raw file):

Previously, awoods187 (Andy Woods) wrote…

might be worth referencing the ORMs that need this

Done.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 279 at r3 (raw file):

Previously, awoods187 (Andy Woods) wrote…

Could this be flagged or gated behind beta support? I too am concerned with the user education component (but would love a solution to unlock this for others who request it)

I don't think it's practical (or necessary) to gate this as a beta feature. The behavior of the queries we support will not differ from their correct behaviors. Therefore, adding single-use CTEs only should be no more surprising to our users than our lack of CTEs entirely today.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file):

Previously, awoods187 (Andy Woods) wrote…

Useful to discuss this in the context of other planned SQL projects like the optimizer

This feature is pretty much orthogonal to the optimizer, as far as I can tell, so I'm going to leave that out.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 304 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is the use of temporary storage automatic? I thought processors had to utilize it explicitly.

You're right. Updated.


Comments from Reviewable

@jordanlewis
Copy link
Member Author

This RFC is entering the final comment period. Targeting a merge on Wednesday.


Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Dec 4, 2017

LGTM

@awoods187
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 279 at r3 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I don't think it's practical (or necessary) to gate this as a beta feature. The behavior of the queries we support will not differ from their correct behaviors. Therefore, adding single-use CTEs only should be no more surprising to our users than our lack of CTEs entirely today.

alright i'm convinced. We solve this via user education in docs and/or blog posts


docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

This feature is pretty much orthogonal to the optimizer, as far as I can tell, so I'm going to leave that out.

got it--i think i misunderstood that the work done for the optimizer would also make non-recursive CTEs easy


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 293 at r3 (raw file):

Previously, awoods187 (Andy Woods) wrote…

got it--i think i misunderstood that the work done for the optimizer would also make non-recursive CTEs easy

For multi-use CTEs (which require materialization), the query optimizer will need to add "spool" processors in order to enforce a "rewindability" property for operators which don't natively support "rewindability". This is similar to the way the optimizer will add a "sort" processor to enforce ordering.

Rather than implementing CTEs now, we could wait for the new optimizer. It is possible that some of the work down now will need to be reimplemented. But given the limited scope here, I'm not terribly worried about that.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 53 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Added a section about this to future work.

That sounds right to me. The processor you are describing will be necessary for implementing nested loop joins as well.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 53 at r1 (raw file):

Previously, RaduBerinde wrote…

That sounds right to me. The processor you are describing will be necessary for implementing nested loop joins as well.

Yep, @knz mentioned that today as well. Note that table scan and index scan operators support "rewindability", so a spool would only be necessary for a nested loop join where the right hand side is not a scan.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 53 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Yep, @knz mentioned that today as well. Note that table scan and index scan operators support "rewindability", so a spool would only be necessary for a nested loop join where the right hand side is not a scan.

The operators might in theory, but the processors don't :) I don't have a clear image of how a nested loop join will work with distsql anyway; it seems like we need some kind of dynamic planning.

For CTEs we know upfront how many outputs we have, they can all be pre-configured.


Comments from Reviewable

@richardwu
Copy link
Contributor

richardwu commented Dec 5, 2017

Crazy idea: in the local engine, what if we defined some withNode that acts as a "shared pointer" to the same child plan node (which is constructed from the WITH clause)?

It could be something as simple

type withNode struct {
   plan planNode
   numRefs int
   outputBuffer []tree.Datums

   curIdx int
   recentNext bool
}

func with(existing *withNode, ...) *withNode {
  if existing != nil {
      existing.numRefs++
      return existing
  }
}

func (w *withNode) Start() {
  w.outputBuffer := make([]tree.Datums, w.numRefs)
  w.curIdx = -1
}

func (w *withNode) Next() bool {
   w.curIdx = (w.curIdx + 1) % w.numRefs
   if w.curIdx == 0 {
      w.recentNext = w.planNext()
   }
   return w.recentNext
}

func (w *withNode) Values() tree.Datums {
  if w.curIdx == 0 {
    for i := range w.outputBuffer { 
       copy(w.outputBuffer[i], w.plan.Values())
    }
  }

  return w.outputBuffer[w.curIdx]
}

func (w *withNode) Close() {
  w.numRefs--
  
  if w.numRefs == 0 {
    w.plan.Close()
  }
}

There's probably an off by 1 error with w.curIdx and Closes, but I I believe this should work under the Volcano model without needing to use temporary storage. As for DistSQL, would a mirrored router work?

@RaduBerinde
Copy link
Member

I don't see how that would work. If multiple nodes call Next(), they may be interleaving their calls. We would have to know which client is calling and keep a list of indices.

For DistSQL, a mirror router works but the router does not yet spill the buffer to disk (it's a TODO).

@a-robinson
Copy link
Contributor

👏


Reviewed 1 of 2 files at r3, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 39 at r4 (raw file):

the names of the statements, the execution engine must ensure that each impure
clause (either one that modifies data or one that uses an impure builtin
function) is not executed more than once. CTEs increase increase the

s/increase increase/increase/


docs/RFCS/20171206_single_use_common_table_expressions.md, line 306 at r4 (raw file):

Temporary storage may be less involved than fully implementing temporary
tables. DistSQL processrs can be configured to use temporary storage in a

s/processrs/processors


Comments from Reviewable

@jordanlewis
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks pending.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 39 at r4 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/increase increase/increase/

Done.


docs/RFCS/20171206_single_use_common_table_expressions.md, line 306 at r4 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/processrs/processors

Done.


Comments from Reviewable

This RFC is a proposal to implement the restricted subset of common
table expressions that doesn't require temporary table infrastructure.
A CTE will be supported if each of its named clauses isn't referenced
more than once as a data source.

Release note: None
@jordanlewis
Copy link
Member Author

TFTRs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants