-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GROUP BY *
aggregation
#18390
base: master
Are you sure you want to change the base?
Conversation
fc01df4
to
e809697
Compare
This is great. I fully support adding this syntax. |
48bb65e
to
36bff45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conflicts with the standard syntax for GROUP BY:
GROUP BY [ <set quantifier> ] <grouping element list>
where <set quantifier>
can be ALL
or DISTINCT
and defaults to ALL
if omitted. The quantifier affects the semantics for queries involving grouping sets. Overloading the meaning to indicate how the keys are selected instead of how the rows in the result are de-duplicated is confusing and error prone.
Before we could consider such syntax, we'd need to define the precise semantics and how it interacts and relates to the broader GROUP BY
feature.
In particular, here are some inconsistencies:
produces a result, but
fails with
fails with Without precise semantics, it's hard to tell what's the behavior of these queries after this change:
|
|
36bff45
to
9153730
Compare
9153730
to
9882169
Compare
To be clear, the ALL and DISTINCT qualifiers control whether grouping sets that have the same combination of keys are deduped. Therefore, the only way this syntax makes sense is if we give meaning to omitting the grouping set specification. Specifically, to be equivalent to having a single grouping set composed of all the expressions in the group by clause that don’t contain aggregations. In that case, the qualifier is orthogonal to such feature. Allowing one but not the other looks arbitrary and introduces cognitive load for a user who has to understand that they are somehow connected even though intuitively they should not be. Another aspect that complicates issues conceptually is that the GROUP BY operation occurs before the SELECT clause is computed, so it’s a chicken-and-egg problem to determine which columns are grouping keys and which ones are derived. Also, the GROUP BY clause operates on input columns (those coming from the FROM clause) not on those from the SELECT clause. The implication arrow goes the other way: an expression in the SELECT clause is valid if it’s functionally dependent on the input columns used for computing the grouping sets. |
9882169
to
2d40f4a
Compare
@martint Thank you for your detailed explanation. Can you suggest alternative syntax? Or we don't want to add this feature? |
2d40f4a
to
7dc90bb
Compare
7dc90bb
to
e2b781e
Compare
8890f18
to
0b7ccc1
Compare
GROUP BY ALL
aggregationGROUP BY *
aggregation
0b7ccc1
to
98f0ff2
Compare
98f0ff2
to
4353674
Compare
Description
This syntax allows omitting column positions or names after
GROUP BY
.For instance,
SELECT name, count(1) FROM GROUP BY *
is equivalent toSELECT name, count(1) FROM GROUP BY name
This PR changesALL
as a reserved keyword to distinguish it from column names.References in other database/query engines
Release notes
(x) Release notes are required, with the following suggested text: