Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make GROUP BY recognize projection aliases #28059

Closed
knz opened this issue Jul 30, 2018 · 14 comments · Fixed by #42447
Closed

sql: make GROUP BY recognize projection aliases #28059

knz opened this issue Jul 30, 2018 · 14 comments · Fixed by #42447
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-semantics C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases.
Milestone

Comments

@knz
Copy link
Contributor

knz commented Jul 30, 2018

Reported by @clanstyles on Gitter.

This is a subtle extension to the SQL standard recognized by PostgreSQL which currently causes CockroachDB to silently return different results from pg in common cases. So the severity is high.

To understand the problem, let's start with the "easy" and innocuous form of the bug:

> create table t(x int);
> select x+1 as z from t group by z;

This is recognized by PostgreSQL (groups by the projected expression x+1), but not by CockroachDB currently (reports an error z must be grouped by).

The difference between pg and crdb here is that GROUP BY will recognize simple/naked projection aliases before resolving the names using the regular algorithm (like ORDER BY already does.) This is a behavior specific to "simple identifiers" (like for ORDER BY).

Why this matters is that CockroachDB fails to report an error in pretty common cases, and instead returns, silently, very different results from PostgreSQL:

>  select (x % 10) as x
     from t
 group by x;

This currently groups by t.x in CockroachDB, whereas it should group by (x%10) like in PostgreSQL!

Both the heuristic planner and the opt code must be extended to support this!

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-semantics A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-optimizer SQL logical planning and optimizations. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. labels Jul 30, 2018
@knz knz assigned knz and rytaft Jul 30, 2018
@knz
Copy link
Contributor Author

knz commented Jul 30, 2018

cc @RaduBerinde

@knz
Copy link
Contributor Author

knz commented Aug 9, 2018

Started to investigate.

The problem is worse than that said above, actually the resolution may currently be incorrect also for ORDER BY.

@knz
Copy link
Contributor Author

knz commented Aug 9, 2018

See the code in pg's src/parser/parse_clause.c, findTargetlistEntrySQL92

The following is also valid and needs to be checked:

select x+1 from foo group by "?column?"

@knz knz added S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. docs-todo docs-known-limitation and removed S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. labels Aug 9, 2018
@knz knz added this to the 2.2 milestone Aug 9, 2018
@sploiselle
Copy link
Contributor

@knz Can I get a quick blurb describing this known limitation w/r/t the impact to user experience? Ideally, we need it by Friday 10/26 for the 2.1 Known Limitations page. Posting it on this issue and/or pinging me would be great.

@knz
Copy link
Contributor Author

knz commented Oct 24, 2018

"""
title: GROUP BY referring to SELECT aliases

Applications developed for PostgreSQL that use GROUP BY to refer to column aliases produced in the same SELECT clause must be changed to use the full underlying expression instead. For example, SELECT x+y AS z ... GROUP BY z must be changed to SELECT x+y AS z ... GROUP BY x+y. Otherwise, CockroachDB will produce either a planning error or in some cases invalid results.
"""

@RaduBerinde
Copy link
Member

We are running tight on 19.1, I am pushing this out given that the fix is not trivial and the "surface area" is not very high.

@knz knz modified the milestones: Later, 19.1.x May 6, 2019
@knz knz removed their assignment May 6, 2019
@rafiss
Copy link
Collaborator

rafiss commented Nov 11, 2019

I think aliases and GROUP BY do not play well in most circumstances:

This came up when testing with Django: cockroachdb/django-cockroachdb#82

> create table t (a int primary key);
CREATE TABLE

> insert into t values(1);
INSERT 1

> select exists (select a from t) as e1, count(*) from t group by e1;
pq: column "e1" does not exist

> select (select a from t) as s1, count(*) from t group by s1;
pq: column "s1" does not exist

> select 1 as s1 group by s1;
pq: column "s1" does not exist

@RaduBerinde RaduBerinde modified the milestones: 19.1.x, 19.2 Nov 11, 2019
@RaduBerinde
Copy link
Member

We should prioritize fixing this and backport to 19.2.x. I can take a look this week.

@RaduBerinde RaduBerinde self-assigned this Nov 11, 2019
@RaduBerinde
Copy link
Member

Trying to understand the postgres semantics.. Given the above examples, I would have expected that the aliases take precedence over the input columns, but that doesn't seem to be the case:

radu=> select a as x from ab group by x;
ok
radu=> select a as b from ab group by b;
ERROR:  column "ab.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select a as b from ab group by b;

They do seem to take precedence over columns from higher scopes:

radu=> select c, d, (select a as c from ab group by c) from cd;
ERROR:  more than one row returned by a subquery used as an expression

Here c refers to the alias and not the outer column (otherwise it couldn't have returned more than one row).

@knz
Copy link
Contributor Author

knz commented Nov 13, 2019 via email

@RaduBerinde
Copy link
Member

Thanks @knz, that definitely helped. I will try to harvest some test cases from postgres as well.

@RaduBerinde
Copy link
Member

RaduBerinde commented Nov 13, 2019

I would like to correct something in the original description. The aliases are used only if they don't conflict with the FROM column names, which have precedence. So in the (x%10) as x example, we do the right thing. This is postgres:

radu=> insert into ab values (1, 1), (11, 11);
INSERT 0 2
radu=> select (a % 10) as a from ab group by a;
 a 
---
 1
 1

craig bot pushed a commit that referenced this issue Nov 14, 2019
42447: opt: support grouping by aliases r=RaduBerinde a=RaduBerinde

There is some baggage left over from SQL92 which allowed grouping by
select targets by their alias. We implement the same rules used by
postgres, as explained in the `buildGroupingColumns` comment.

Fixes #28059.

Release note (sql change): It is now supported to specify selection
target aliases as GROUP BY columns. Note that the FROM columns take
precedence over the aliases, which are only used if there is no column
with that name in the current scope.

Co-authored-by: Radu Berinde <[email protected]>
@craig craig bot closed this as completed in 9d925b5 Nov 14, 2019
koorosh pushed a commit to koorosh/cockroach that referenced this issue Nov 15, 2019
There is some baggage left over from SQL92 which allowed grouping by
select targets by their alias. We implement the same rules used by
postgres, as explained in the `buildGroupingColumns` comment.

Fixes cockroachdb#28059.

Release note (sql change): It is now supported to specify selection
target aliases as GROUP BY columns. Note that the FROM columns take
precedence over the aliases, which are only used if there is no column
with that name in the current scope.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Nov 23, 2019
There is some baggage left over from SQL92 which allowed grouping by
select targets by their alias. We implement the same rules used by
postgres, as explained in the `buildGroupingColumns` comment.

Fixes cockroachdb#28059.

Release note (sql change): It is now supported to specify selection
target aliases as GROUP BY columns. Note that the FROM columns take
precedence over the aliases, which are only used if there is no column
with that name in the current scope.
@jseldess
Copy link
Contributor

@RaduBerinde, since this issue is closed, does that mean we can remove this limitation from the 20.1 known limitations? https://www.cockroachlabs.com/docs/dev/known-limitations.html#group-by-referring-to-select-aliases

@RaduBerinde
Copy link
Member

@jseldess yes, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-semantics C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants