Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support correlated subqueries #3288

Closed
maddyblue opened this issue Dec 2, 2015 · 37 comments
Closed

sql: support correlated subqueries #3288

maddyblue opened this issue Dec 2, 2015 · 37 comments
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-semantics C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Milestone

Comments

@maddyblue
Copy link
Contributor

https://github.com/cockroachdb/sqllogictest/blob/a88396b84bb1fe62edf2e072e585d68c0b2ecdca/test/select1.test#L109

Fails with:

logic_test.go:420: ../../sqllogictest/test/select1.test:109: expected success, but found pq: qualified name "x.b" not found

I believe this is because the subquery references both x and t1 (where t1 is from the parent FROM clause), but only x is in the visible tables list.


The description above was conflating two bugs into one. This is a real bug, but it isn't caused by the linked query.

@petermattis
Copy link
Collaborator

Yeah, this is what I meant the other day when I stated that only basic subqueries work. Subqueries can utilize variables from their surrounding context, but we don't handle that yet.

@tamird
Copy link
Contributor

tamird commented Dec 2, 2015

For reference, the query:

query IIIII nosort
SELECT a+b*2+c*3+d*4+e*5,
       CASE WHEN a<b-3 THEN 111 WHEN a<=b THEN 222
        WHEN a<b+3 THEN 333 ELSE 444 END,
       abs(b-c),
       (a+b+c+d+e)/5,
       a+b*2+c*3
  FROM t1
 WHERE (e>c OR e<d)
   AND d>e
   AND EXISTS(SELECT 1 FROM t1 AS x WHERE x.b<t1.b)
 ORDER BY 4,2,1,3,5

Subqueries can utilize variables from their surrounding context, but we don't handle that yet.

I don't think that's what's happening here.

@tamird
Copy link
Contributor

tamird commented Dec 2, 2015

Closing in favour of #3291.

@tamird tamird closed this as completed Dec 2, 2015
@petermattis
Copy link
Collaborator

You're correct that this is not a correlated subquery, but we need to support those as well.

@petermattis petermattis reopened this Dec 2, 2015
@petermattis petermattis changed the title sql: table aliasing in subqueries needs to inherit from parent tables sql: support correlated subqueries Dec 2, 2015
@bdarnell
Copy link
Contributor

bdarnell commented Feb 5, 2016

Yikes! Correlated subqueries look scary (I had no idea such a thing was possible) and seem impossible to support efficiently except in limited cases (when they can be transformed into joins, which is what amazon redshift seems to do). I'm entirely comfortable with not supporting them.

@petermattis
Copy link
Collaborator

Apparently correlated subqueries can always be transformed into joins. At least, I recall reading literature that stated that. The point of using a correlated subquery instead of a join is that the query can sometimes be written more naturally that way. I don't think we should support correlated subqueries before we support joins and then we should only support them by transforming them into joins.

@petermattis petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) SQL labels Feb 12, 2016
@petermattis petermattis added this to the 1.0 milestone Feb 12, 2016
@petermattis petermattis removed the SQL label Feb 13, 2016
@maddyblue
Copy link
Contributor Author

@knz is this done?

@knz
Copy link
Contributor

knz commented Aug 1, 2016

Not yet, wanna take it?

@maddyblue
Copy link
Contributor Author

I was just making sure this wasn't done with the other JOIN work and that the issue should remain open.

@jordanlewis
Copy link
Member

This feature is required by the ActiveRecord query issued in #12783.

In case it's useful, here is a simple query that requires this feature that Postgres supports and we don't. In Postgres:

jordan=# SELECT nspname, (SELECT n.nspname) FROM pg_namespace n LIMIT 1;
 nspname  | nspname
----------+----------
 pg_toast | pg_toast
(1 row)

In CockroachDB:

root@:26257> SELECT nspname, (SELECT n.nspname) FROM pg_namespace n LIMIT 1;
pq: source name "n" not found in FROM clause

@jordanlewis
Copy link
Member

The example I posted seems to be a limited subcase of generalized correlated queries. Since ActiveRecord only needs that limited subcase, I've opened #12993 to track this particular subcase.

@jordanlewis
Copy link
Member

I take it back. The query in #12783 does actually need full correlated subquery support.

knz added a commit to knz/cockroach that referenced this issue Jul 11, 2018
Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      cockroachdb#3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.
craig bot pushed a commit that referenced this issue Jul 11, 2018
27390: sql: properly reject nested generators in ROWS FROM r=knz a=knz

Fixes #27389.

Prior to this patch, invalid uses of SRFs as arguments to other
functions in ROWS FROM were not properly rejected, and were only
caught at evaluation time (i.e. much too late).

This patch fixes it by rejecting these uses early.

Release note (bug fix): invalid uses of set-generating functions in
FROM clauses are now reported with the same error code as PostgreSQL.

27396: sql: help the user understand unsupported correlation r=knz a=knz

Fixes #24684.

Prior to this patch, a client trying to use an unsupported correlated
query would encounter an obscure error like "column v does not exist"
or "no data source matches prefix".

Given that the new optimizer code can determine whether a query is
correlated, we can use this information to enhance the error message.

Before:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
```

After:

```
> select * from pg_class a where exists (select * from pg_class b where a.oid = b.oid);
pq: no data source matches prefix: a
HINT: some correlated subqueries are not supported yet - see
      #3288

> select * from pg_class a where exists (select * from kv where v::oid = oid);
pq: column "oid" does not exist
HINT: some correlated subqueries are not supported yet - see
      #3288
```

Note: some correlated queries do not benefit from this improvement,
specifically those for which the optimizer code aborts early before it
has detected correlation (e.g. because of some other unrelated
feature).

Release note (sql change): CockroachDB will now report a hint in the
error message if it encounters a correlated query that it does not
support yet.


Co-authored-by: Raphael 'kena' Poss <[email protected]>
@tim-o
Copy link
Contributor

tim-o commented Aug 12, 2018

Zendesk ticket #2713 has been linked to this issue.

@andy-kimball
Copy link
Contributor

Correlated subquery support is now part of 2.1, so closing this issue down.

@Freeaqingme
Copy link

I'm running v2.1.0-beta.20180904, hoping to get Symfony+Doctrine ORM working with Cockroachdb. The query below, however, returns this error message:

SQLSTATE[42P01]: Undefined table: 7 ERROR: no data source matches prefix: a HINT: some correlated subqueries are not supported yet - see https://github.com/cockroachdb/cockroachissues/3288

Can you provide guidance? I'd assume that either the correlated subquery stuff somehow didn't make it into v2.1.0-beta.20180904, or that the error message is also used for a different issue.

  An exception occurred while executing 'SELECT                                              
                      a.attnum,                                                              
                      quote_ident(a.attname) AS field,                                       
                      t.typname AS type,                                                     
                      format_type(a.atttypid, a.atttypmod) AS complete_type, (SELECT tc.col  
  lcollate FROM pg_catalog.pg_collation tc WHERE tc.oid = a.attcollation) AS collation,      
                      (SELECT t1.typname FROM pg_catalog.pg_type t1 WHERE t1.oid = t.typbas  
  etype) AS domain_type,                                                                     
                      (SELECT format_type(t2.typbasetype, t2.typtypmod) FROM                 
                        pg_catalog.pg_type t2 WHERE t2.typtype = 'd' AND t2.oid = a.atttypi  
  d) AS domain_complete_type,                                                                
                      a.attnotnull AS isnotnull,                                             
                      (SELECT 't'                                                            
                       FROM pg_index                                                         
                       WHERE c.oid = pg_index.indrelid                                       
                          AND pg_index.indkey[0] = a.attnum                                  
                          AND pg_index.indisprimary = 't'                                    
                      ) AS pri,                                                              
                      (SELECT pg_get_expr(adbin, adrelid)                                    
                       FROM pg_attrdef                                                       
                       WHERE c.oid = pg_attrdef.adrelid                                      
                          AND pg_attrdef.adnum=a.attnum                                      
                      ) AS default,                                                          
                      (SELECT pg_description.description                                     
                          FROM pg_description WHERE pg_description.objoid = c.oid AND a.att  
  num = pg_description.objsubid                                                              
                      ) AS comment                                                           
                      FROM pg_attribute a, pg_class c, pg_type t, pg_namespace n             
                      WHERE n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast  
  ') AND c.relname = 'backward_dependencies' AND n.nspname = 'crdb_internal'                 
                          AND a.attnum > 0                                                   
                          AND a.attrelid = c.oid                                             
                          AND a.atttypid = t.oid                                             
                          AND n.oid = c.relnamespace                                         
                      ORDER BY a.attnum': 

/cc @andy-kimball @knz

@maddyblue
Copy link
Contributor Author

sqlfmt formatted for the interested

@knz knz reopened this Sep 10, 2018
@knz
Copy link
Contributor

knz commented Sep 10, 2018

I think the issue should stay open until all correlated subqueries are supported, including the apply operator for those that cannot be decorrelated.

@andy-kimball Lauren and I are planning to make a writeup in the docs of the limits of decorrelation currently. Perhaps we could link to that from here.

@andy-kimball
Copy link
Contributor

The posted query has two issues that prevent decorrelation:

  1. Use of a subscript expression [0]; this is not currently supported by the cost-based optimizer and so triggers fallback to the heuristic planner.
  2. Absence of unique index assertions on the pg_catalog tables, so the optimizer cannot statically prove that correlated subqueries return <= 1 result.

@Freeaqingme
Copy link

Freeaqingme commented Sep 10, 2018

Thanks for clarifying.

Typically, I'd be more than happy to simply rewrite a query or two. In this case though, this query is generated by Doctrine when it tries to see what the current schema looks like when creating migration scripts (./doctrine migrations:diff)

As such, there's not an easy work-around where I can just rewrite one or two of my own queries. For the time being I'll just use mysql instead, but figured this bit of context might help in determining the appropriate priority for support of the posted query..

@awoods187
Copy link
Contributor

@jordanlewis can we close this now?

@Fornax96
Copy link

Fornax96 commented Apr 7, 2019

It seems this doesn't work for DELETE queries yet. When I run this, I get the expected result:

SELECT * FROM list WHERE NOT EXISTS(SELECT 1 FROM list_file WHERE list_id = list.id);
  id | public_id | title | views |           date_created           | user_id |     user_ip      
+----+-----------+-------+-------+----------------------------------+---------+-----------------+
  44 | feV9vhnW  | bob   |     0 | 2019-04-07 09:34:53.223956+00:00 |       1 | 127.0.0.1:53948  
(1 row)

But when I replace SELECT with DELETE it breaks:

DELETE FROM list WHERE NOT EXISTS(SELECT 1 FROM list_file WHERE list_id = list.id);  
pq: no data source matches prefix: list

I'm on cockroach 2.1.6

@awoods187
Copy link
Contributor

@Fornax96 I expect this to work on our upcoming 19.1 release. You can try a beta here (https://www.cockroachlabs.com/docs/releases/v19.1.0-rc.2.html).

@Fornax96
Copy link

Awesome, I'll try it out.

@Fornax96
Copy link

Fornax96 commented Apr 30, 2019

Upgraded to 19.1, it works now!

@andy-kimball
Copy link
Contributor

I'm closing this issue out, since we've now released 19.1 with support for correlated subqueries both in read-only and mutation statements.

@insaner
Copy link

insaner commented May 26, 2020

Hi I'm not sure if this is related to this issue, but this is where I ended up on a search of my problem. I'm trying stuff out in cockroachdb, learning the ropes still, and I'm trying to use the RETURNING feature of DELETE to then modify a row on another table, as follows:

UPDATE u SET c = true WHERE id = (DELETE FROM uu WHERE code = 'some code' RETURNING id);

But I am getting an error:

invalid syntax: statement ignored: at or near "from": syntax error
DETAIL: source SQL:
UPDATE u SET c = true WHERE id = (DELETE FROM uu WHERE code = 'some code'  RETURNING id)
                                         ^
HINT: try \h UPDATE

If I run these separately they work fine (supplying the data as needed).

I also tried:

WITH cu AS (DELETE FROM uu WHERE code = 'some code' RETURNING id) UPDATE u SET c = true WHERE id = cu.id

but I get:

pq: no data source matches prefix: cu

I know there are workarounds, but these introduce (worse) race conditions that I would like to avoid if possible. Am I doing something wrong, or is what I am trying to do not yet implemented? Thanks.

build: CCL v19.2.6 @ 2020/04/06 18:05:31 (go1.12.12)


Edit:
Argh, I see this might be more closely related to: #43963 (which I only saw after posting the above msg)

@knz
Copy link
Contributor

knz commented May 26, 2020

@insaner this is a legitimate need but it's a new issue. Please file a new issue and copy-paste the text of your submission.

@insaner
Copy link

insaner commented May 27, 2020

Perfect, thanks for looking into this. I've filed a new issue.

@andy-kimball
Copy link
Contributor

In case someone else comes upon this issue, this does work in CRDB and PG, it's just that the syntax wasn't quite right. It should be like this instead:

WITH cu AS (DELETE FROM uu WHERE code = 'some code' RETURNING id)
UPDATE u SET c = true FROM cu WHERE id = cu.id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-semantics C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests