-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
opt: take advantage of partial ordering in the hash aggregator
Before this change, the optimizer cost model for group by included overhead for processing all input rows for all non-streaming aggregation. Recent changes in the vectorized execution engine added optimizations for aggregation with partially ordered grouping columns that does not require all input rows to be processed if there is a limit, but still requires a hash table, unlike streaming aggregation. This change adds checks for whether an aggregation has a subset of grouping columns that are ordered, and costs it similarly to streaming aggregation by reflecting the limit hint on the number of both input rows and output rows. We also pass the limit hint property to child nodes of the aggregation if the grouping columns are partially ordered. We also add a new exploration rule, `GenerateLimitedGroupByScans`, to enable the optimizer to explore scans on secondary indexes that lead to partially ordered grouping columns. Previously, fully ordered grouping columns could be found by exploring secondary indexes with full cover over the columns via `GenerateIndexScans`. When introducing partially ordered grouping columns, however, not all grouping columns may be part of an index, so we may need to construct an IndexJoin to add the missing columns. The new exploration rule is triggered when there is a limit expression with a positive constant limit, a canonical group by, and a canonical scan. This change also modifies the criteria for streaming group by to include group by with no grouping columns. This change also adds the group by mode (streaming, hybrid, or none) to the EXPLAIN(OPT) output for easier debugging. Fixes: #63049 Fixes: #71768 Release note (performance improvement): Improves performance of some GROUP BY queries with a LIMIT if there is an index ordering that matches a subset of the grouping columns. In this case the total number of aggregations needed to satisfy the LIMIT can be emited without scanning the entire input, enabling the execution to be more effective.
- Loading branch information
1 parent
9b7fdf0
commit 3916aa6
Showing
111 changed files
with
1,358 additions
and
679 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.