Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: WITH RECURSIVE (recursive common table expressions) #21085

Closed
jordanlewis opened this issue Dec 28, 2017 · 4 comments · Fixed by #41368
Closed

sql: WITH RECURSIVE (recursive common table expressions) #21085

jordanlewis opened this issue Dec 28, 2017 · 4 comments · Fixed by #41368
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) X-anchored-telemetry The issue number is anchored by telemetry references.
Milestone

Comments

@jordanlewis
Copy link
Member

Split from #7029 which was closed when we merged support for single-use common table expressions in #20359.

Support WITH RECURSIVE, which allows a query to refer to its own output.

A simple example from Postgres's documentation sums the numbers from 1 to 100:

WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

The t clause uses itself as a data source, which requires the RECURSIVE modifier to WITH. This feature likely requires temporary tables to perform well.

@turbo
Copy link

turbo commented Mar 21, 2018

Recursive CTEs are vital to storing graph or tree structures in a database and are even present in SQLite ;). Here's my specific use case: Calculating the Path of a Row in an Adjacency List When Recursive CTE Are Not Supported

@jordanlewis jordanlewis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL labels Apr 27, 2018
@WestFarmer
Copy link

any updates?

please consider add full support for tree structure, such as 'path, level, isLeaf, sort siblings'

@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@knz knz added X-anchored-telemetry The issue number is anchored by telemetry references. A-sql-optimizer SQL logical planning and optimizations. labels Nov 22, 2018
@gigatexal
Copy link

Just wanted to add my request to have recursive CTEs supported and if accepted a timeline for when to expect it?

@jkassis
Copy link

jkassis commented Mar 5, 2019

thumbs up for this. recursive queries from the client side could produce lots of network chatter on a big table. consider a hierarchical category system for an online store for a business case. defendants need to be queried for each user search and for category operations.

jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 2, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
@RaduBerinde RaduBerinde self-assigned this Oct 7, 2019
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Oct 7, 2019
A recursive CTE is of the form:
```
WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)
```

Recursive CTE evaluation (paraphrased from postgres docs):
 1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
 2. So long as the working table is not empty:
    - evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference. Emit all
      resulting rows, and save them into the next iteration's working
      table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

 - optbuilder:  We build the recursive query using a `cteSource` that
   corresponds to the "working table". This code turned out to be
   tricky, in part because we want to allow non-recursive CTEs even if
   RECURSIVE is used (which is necessary when defining multiple CTEs,
   some of which are not recursive).

 - execution: the execution operator is somewhat similar to
   `applyJoinNode` (but simpler). The execbuilder provides a
   callback that can be used to regenerate the right-hand-side plan
   each time; this callback takes a reference to the working table (as
   a `bufferNode`).

 - execbuilder: to implement the callback mentioned above, we create
   an "inner" builder inside the callback which uses the same factory
   but is otherwise independent from the "outer" builder.

Fixes cockroachdb#21085.

Release note (sql change): WITH RECURSIVE is now supported.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Oct 8, 2019
A recursive CTE is of the form:
```
WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)
```

Recursive CTE evaluation (paraphrased from postgres docs):
 1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
 2. So long as the working table is not empty:
    - evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference. Emit all
      resulting rows, and save them into the next iteration's working
      table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

 - optbuilder:  We build the recursive query using a `cteSource` that
   corresponds to the "working table". This code turned out to be
   tricky, in part because we want to allow non-recursive CTEs even if
   RECURSIVE is used (which is necessary when defining multiple CTEs,
   some of which are not recursive).

 - execution: the execution operator is somewhat similar to
   `applyJoinNode` (but simpler). The execbuilder provides a
   callback that can be used to regenerate the right-hand-side plan
   each time; this callback takes a reference to the working table (as
   a `bufferNode`).

 - execbuilder: to implement the callback mentioned above, we create
   an "inner" builder inside the callback which uses the same factory
   but is otherwise independent from the "outer" builder.

Fixes cockroachdb#21085.

Release note (sql change): WITH RECURSIVE is now supported.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Oct 8, 2019
A recursive CTE is of the form:
```
WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)
```

Recursive CTE evaluation (paraphrased from postgres docs):
 1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
 2. So long as the working table is not empty:
    - evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference. Emit all
      resulting rows, and save them into the next iteration's working
      table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

 - optbuilder:  We build the recursive query using a `cteSource` that
   corresponds to the "working table". This code turned out to be
   tricky, in part because we want to allow non-recursive CTEs even if
   RECURSIVE is used (which is necessary when defining multiple CTEs,
   some of which are not recursive).

 - execution: the execution operator is somewhat similar to
   `applyJoinNode` (but simpler). The execbuilder provides a
   callback that can be used to regenerate the right-hand-side plan
   each time; this callback takes a reference to the working table (as
   a `bufferNode`).

 - execbuilder: to implement the callback mentioned above, we create
   an "inner" builder inside the callback which uses the same factory
   but is otherwise independent from the "outer" builder.

Fixes cockroachdb#21085.

Release note (sql change): WITH RECURSIVE is now supported.
craig bot pushed a commit that referenced this issue Oct 10, 2019
41368: opt: support recursive CTEs r=RaduBerinde a=RaduBerinde

#### opt: minor refactoring in union optbuilder

This change pulls up the logic to cross-check column types, to be used
for `WITH RECURSIVE`. The resulting code is more straightforward to
follow.

Release note: None

#### opt: use Presentation for cteSource.cols in optbuilder

The columns in `cteSource` are only used for their IDs and names,
which is better represented by a `Presentation`. This will make things
easier for recursive CTEs.

We also make the "no columns" error more general since our syntax
allows more types of statements inside WITH.

Release note: None

#### opt: support recursive CTEs

A recursive CTE is of the form:
```
WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)
```

Recursive CTE evaluation (paraphrased from postgres docs):
 1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
 2. So long as the working table is not empty:
    - evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference. Emit all
      resulting rows, and save them into the next iteration's working
      table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

 - optbuilder:  We build the recursive query using a `cteSource` that
   corresponds to the "working table". This code turned out to be
   tricky, in part because we want to allow non-recursive CTEs even if
   RECURSIVE is used (which is necessary when defining multiple CTEs,
   some of which are not recursive).

 - execution: the execution operator is somewhat similar to
   `applyJoinNode` (but simpler). The execbuilder provides a
   callback that can be used to regenerate the right-hand-side plan
   each time; this callback takes a reference to the working table (as
   a `bufferNode`).

 - execbuilder: to implement the callback mentioned above, we create
   an "inner" builder inside the callback which uses the same factory
   but is otherwise independent from the "outer" builder.

Fixes #21085.

Release note (sql change): WITH RECURSIVE is now supported.

Co-authored-by: Radu Berinde <[email protected]>
@craig craig bot closed this as completed in 7bc04d7 Oct 10, 2019
@awoods187 awoods187 added this to the 20.1 milestone Oct 11, 2019
rafiss added a commit to rafiss/cockroach that referenced this issue Oct 15, 2019
Now that cockroachdb#21085 is fixed, a test in hibernate started passing.

Release note: None
craig bot pushed a commit that referenced this issue Oct 15, 2019
41618: roachtest: update hibernate blacklist after recursive CTE support r=rafiss a=rafiss

Now that #21085 is fixed, a test in hibernate started passing.

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 24, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
craig bot pushed a commit that referenced this issue Nov 7, 2019
41252: roachtest: add test that aggregates orm blacklist failures r=jordanlewis a=jordanlewis

The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: #5807   (sql: Add support for TEMP tables)
 151: #17511  (sql: support stored procedures)
  86: #26097  (sql: make TIMETZ more pg-compatible)
  56: #10735  (sql: support SQL savepoints)
  55: #32552  (multi-dim arrays)
  55: #26508  (sql: restricted DDL / DML inside transactions)
  52: #32565  (sql: support optional TIME precision)
  39: #243    (roadmap: Blob storage)
  33: #26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: #27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: #12123  (sql: Can't drop and replace a table within a transaction)
  24: #26443  (sql: support user-defined schemas between database and table)
  20: #21286  (sql: Add support for geometric types)
  18: #6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: #22329  (Support XA distributed transactions in CockroachDB)
  16: #24062  (sql: 32 bit SERIAL type)
  16: #30352  (roadmap:when CockroachDB  will support cursor?)
  12: #27791  (sql: support RANGE types)
   8: #40195  (pgwire: multiple active result sets (portals) not supported)
   8: #6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: #23468  (sql: support sql arrays of JSONB)
   5: #40854  (sql: set application_name from connection string)
   4: #35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: #32610  (sql: can't insert self reference)
   4: #40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: #35897  (sql: unknown function: pg_terminate_backend())
   4: #4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: #27796  (sql: support user-defined DOMAIN types)
   3: #3781   (sql: Add Data Type Formatting Functions)
   3: #40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: #35882  (sql: support other character sets)
   2: #10028  (sql: Support view queries with star expansions)
   2: #35807  (sql: INTERVAL output doesn't match PG)
   2: #35902  (sql: large object support)
   2: #40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: #18846  (sql: Support CIDR column type)
   1: #9682   (sql: implement computed indexes)
   1: #31632  (sql: FK options (deferrable, etc))
   1: #24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: #36215  (sql: enable setting standard_conforming_strings to off)
   1: #32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: #36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: #26732  (sql: support the binary operator: <int> / <float>)
   1: #23299  (sql: support coercing string literals to arrays)
   1: #36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: #26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: #21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: #36179  (sql: implicity convert date to timestamp)
   1: #36118  (sql: Cannot parse '24:00' as type time)
   1: #31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None

Co-authored-by: Jordan Lewis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants