Add support for `GROUP BY *` aggregation #18390

ebyhr · 2023-07-24T21:22:27Z

Description

This syntax allows omitting column positions or names after GROUP BY.
For instance, SELECT name, count(1) FROM GROUP BY * is equivalent to SELECT name, count(1) FROM GROUP BY name

~~This PR changes ALL as a reserved keyword to distinguish it from column names.~~

References in other database/query engines

Release notes

(x) Release notes are required, with the following suggested text:

# General
* Add support for `GROUP BY *` aggregation. ({issue}`18390`)

electrum · 2023-07-25T16:30:05Z

This is great. I fully support adding this syntax.

martint

This conflicts with the standard syntax for GROUP BY:

 GROUP BY [ <set quantifier> ] <grouping element list>

where <set quantifier> can be ALL or DISTINCT and defaults to ALL if omitted. The quantifier affects the semantics for queries involving grouping sets. Overloading the meaning to indicate how the keys are selected instead of how the rows in the result are de-duplicated is confusing and error prone.

Before we could consider such syntax, we'd need to define the precise semantics and how it interacts and relates to the broader GROUP BY feature.

martint · 2023-07-31T20:31:10Z

In particular, here are some inconsistencies:

SELECT k, count(*) 
FROM (VALUES 1,2,3) t(k) 
GROUP BY ALL

produces a result, but

SELECT k, count(*) 
FROM (VALUES 1,2,3) t(k) 
GROUP BY

fails with 'k' must be an aggregate expression or appear in GROUP BY clause, even though ALL is implied if missing, per the SQL standard.

SELECT k, count(*) 
FROM (VALUES 1,2,3) t(k) 
GROUP BY DISTINCT

fails with groupingElements must not be empty when type is DISTINCT. Presumably, it should also work. But, what would the meaning of such a query?

Without precise semantics, it's hard to tell what's the behavior of these queries after this change:

SELECt k, count(*) 
FROM (VALUES 1,2,3) t(k) 
GROUP BY ALL k

SELECT k, v, count(*) 
FROM (VALUES (1,1),(2,2),(3,3)) t(k, v) 
GROUP BY ALL

SELECT k, v, count(*) 
FROM (VALUES (1,1),(2,2),(3,3)) t(k, v) 
GROUP BY ALL k

SELECT k, v, count(*) 
FROM (VALUES (1,1),(2,2),(3,3)) t(k, v) 
GROUP BY ALL k, v

SELECT count(*) 
FROM (VALUES 1,2,3) t(k) 
GROUP BY ALL ()

ebyhr · 2023-08-01T01:13:51Z

Since GROUP BY ALL without grouping element list isn't defined in standard, we don't need to default to ALL in a such case and the failure is expected for me.
I don't think it should work. GROUP BY DISTINCT without grouping element list was disallowed even before this change. Also, the standard also disallows it in my understanding. The unclear semantics you mentioned implies the current behavior of throwing an exception is better.
Only 2nd example should be affected in this PR because it's the only example which uses GROUP BY ALL without grouping element list. Other examples should keep the original behavior. This rule is clear to me.

martint · 2023-08-18T00:32:27Z

To be clear, the ALL and DISTINCT qualifiers control whether grouping sets that have the same combination of keys are deduped.

Therefore, the only way this syntax makes sense is if we give meaning to omitting the grouping set specification. Specifically, to be equivalent to having a single grouping set composed of all the expressions in the group by clause that don’t contain aggregations.

In that case, the qualifier is orthogonal to such feature. Allowing one but not the other looks arbitrary and introduces cognitive load for a user who has to understand that they are somehow connected even though intuitively they should not be.

Another aspect that complicates issues conceptually is that the GROUP BY operation occurs before the SELECT clause is computed, so it’s a chicken-and-egg problem to determine which columns are grouping keys and which ones are derived. Also, the GROUP BY clause operates on input columns (those coming from the FROM clause) not on those from the SELECT clause. The implication arrow goes the other way: an expression in the SELECT clause is valid if it’s functionally dependent on the input columns used for computing the grouping sets.

ebyhr · 2024-03-01T01:51:32Z

@martint Thank you for your detailed explanation. Can you suggest alternative syntax? Or we don't want to add this feature?

ebyhr added the syntax-needs-review label Jul 24, 2023

cla-bot bot added the cla-signed label Jul 24, 2023

github-actions bot added the docs label Jul 24, 2023

ebyhr force-pushed the ebi/core-group-by-all branch from fc01df4 to e809697 Compare July 25, 2023 00:47

github-actions bot added tests:hive hive Hive connector labels Jul 25, 2023

ebyhr self-assigned this Jul 25, 2023

ilfrin requested a review from martint July 25, 2023 16:26

ebyhr force-pushed the ebi/core-group-by-all branch 2 times, most recently from 48bb65e to 36bff45 Compare July 31, 2023 02:34

martint requested changes Jul 31, 2023

View reviewed changes

ebyhr force-pushed the ebi/core-group-by-all branch from 36bff45 to 9153730 Compare August 1, 2023 06:17

ebyhr force-pushed the ebi/core-group-by-all branch from 9153730 to 9882169 Compare August 17, 2023 08:46

ebyhr force-pushed the ebi/core-group-by-all branch from 9882169 to 2d40f4a Compare November 1, 2023 01:53

ebyhr force-pushed the ebi/core-group-by-all branch from 2d40f4a to 7dc90bb Compare March 14, 2024 01:52

ebyhr force-pushed the ebi/core-group-by-all branch from 7dc90bb to e2b781e Compare March 24, 2024 23:22

ebyhr marked this pull request as ready for review April 17, 2024 22:18

findepi removed the tests:hive label Apr 18, 2024

github-actions bot added the stale label May 10, 2024

ebyhr added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label May 10, 2024

trinodb deleted a comment from github-actions bot May 15, 2024

ebyhr force-pushed the ebi/core-group-by-all branch 2 times, most recently from 8890f18 to 0b7ccc1 Compare November 8, 2024 12:02

ebyhr changed the title ~~Add support for GROUP BY ALL aggregation~~ Add support for GROUP BY * aggregation Nov 8, 2024

ebyhr force-pushed the ebi/core-group-by-all branch from 0b7ccc1 to 98f0ff2 Compare November 12, 2024 01:54

Add support for GROUP BY ALL aggregation

4353674

ebyhr force-pushed the ebi/core-group-by-all branch from 98f0ff2 to 4353674 Compare January 7, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `GROUP BY *` aggregation #18390

Add support for `GROUP BY *` aggregation #18390

ebyhr commented Jul 24, 2023 •

edited

Loading

electrum commented Jul 25, 2023

martint left a comment

martint commented Jul 31, 2023

ebyhr commented Aug 1, 2023

martint commented Aug 18, 2023

ebyhr commented Mar 1, 2024

Add support for GROUP BY * aggregation #18390

Are you sure you want to change the base?

Add support for GROUP BY * aggregation #18390

Conversation

ebyhr commented Jul 24, 2023 • edited Loading

Description

Release notes

electrum commented Jul 25, 2023

martint left a comment

Choose a reason for hiding this comment

martint commented Jul 31, 2023

ebyhr commented Aug 1, 2023

martint commented Aug 18, 2023

ebyhr commented Mar 1, 2024

Add support for `GROUP BY *` aggregation #18390

Add support for `GROUP BY *` aggregation #18390

ebyhr commented Jul 24, 2023 •

edited

Loading