-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ROLLUP and CUBE grouping expansion #782
Comments
Please let me know what do you think. I want to contribute more to the project! :) |
I haven't spent a lot of time thinking about this but my inclination is that we can already represent both of these concepts in aggregaterel via the use of grouping sets. We're just doing so as explicitly expanded whereas you're requesting an implicitly expanded representation. In Substrait, we're okay having multiple ways to represent an operation. Additionally, we like to maintain context where it might be helpful to execution systems. As such, I'm not against adding abstractions that specifically support cube and rollup. However, if we were to do that, I'm be inclined to just create new relations rather than overloading aggregaterel. Two questions:
Part of the reason I argue for separate relations is that the current proposal in the attached PR actually creates things that I don't think really exist. (For example, a combination of groupings of multiple GroupingExpansion types). |
@jacques-n thank you for feedback!
Yes, I agree about the expressibility. I'm not claiming that CUBE and ROLLUP can't be represented today. The second paragraph of the Motivation states that "the main problem is an exponential expansion of CUBE if you spell out as grouping sets".
As stated in state of the art, in many RDBMSes, CUBE and ROLLUP can be specified as a part of group by or grouping sets (i.e., you can mix and match grouping columns, grouping sets, ROLLUP, and CUBE in GROUP BY GROUPING SETS. I wish I have SQL standard at my disposal for this...) So I believe extending "grouping spec", which is really what ROLLUP and CUBE is about, is the right place to specify expansion rather than a dedicated CUBE or ROLLUP rels, which is clearly restricted than what many systems do support.
As stated above, systems do support these. Please let me know if there are other concerns. |
Motivation
ROLLUP and CUBE are established way to specify multiple groupings. In many databases, ROLLUP and CUBE can appear as a grouping expression so that one does not have to tediously expand all the combinations of the grouping expressions (see the survey of state of RDBMS).
In Substrait, Aggregate rel supports grouping sets. ROLLUP and CUBE can be spelled out by expanding all possible combinations in the plan. The only problem is that CUBE will expand to exponential number of groupings, and sending the fully expanded form is not compact and not aligned well with other parts of Substrait, where compact representation is preferred (e.g., anchor and referring by indexes). Also, fully expanded groupings are hard to grasp the actual operation, thus not helping readability.
The proposal is to support ROLLUP and CUBE as an expansion method of Grouping when specified. The default is point, thus backward compatible, and rollup and cube will specify how to expand the specified sequence of grouping expression. This not only prevents exponential expansions in the plan and is easier to understand the higher-level semantic.
State of the Art
Following databases (but not limited to) all support ROLLUP and CUBE as valid expansions of grouping.
Proposed Change
The proposed changes are two fold.
Grouping
to describe the type of expansion. If omitted, compatible to prior semantic.CUBE( (a, b), c, (d, e))
where(a, b)
and(d, e)
are treated as a unit in expansion).If we can get to consensus, I am happy to contribute the change!
The text was updated successfully, but these errors were encountered: