-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] avg() is not pushed down where other aggreagates are #13336
Comments
Currently yb_agg_pushdown_supported() doesn't accept
|
Currently If Though refactoring is needed in
where
|
Another option (without resorting to |
POC showing the idea written above: |
It could be that AVG has different semantics if done by each tablet. If a 3-tablet table has 30 rows, (n1 + n2 + ... + n30)/30 may not be the same as ((n1 + n2 + ... + n10)/10 + (n11 + n12 + ... + n20) / 10 + (n21 + n22 + ... + n30) / 10) / 3, especially when floating point numbers are considered that can involve rounding. In contrast, MIN, COUNT, MAX, SUM do not have such issue. To ensure same result, we may need to compute SUM and COUNT first, and then when coming back to postgres, compute AVG by doing SUM/COUNT. But I am not sure even that can ensure same result as done by PG considering potential SUM overflow case. |
For example, for the given test case, the following yield the same.
If we can prove that |
In that particular example avg is correct, as That means sums and counts must be kept separate and be combined by summing of separate components until very end where one is divided by other: |
It seems we can modify src/postgres/src/backend/parser/gram.y by specializing If this proposal is adopted, I can send out a patch. |
Summary: Avg is more difficult to push down than other aggregates because it takes two values in its transition function--the running count, and the running sum. These two values must be maintained and sent back to postgres separately, as sending back only the average is not enough for postgres to aggregate the results of each tablet. Postgres uses INT8ARRAYOID as the transition type for avg (of smaller ints) because of this. This means that avg cannot be pushed down in a straightforward manner, as DocDB currently cannot send back data of type INT8ARRAYOID. There are several possible solutions, and some discussion of them can be found in the issue: #13336 This revision substitutes avg with a count and a sum at the pggate level, both of which can be pushed down, and merges the results back together afterwards. This only pushes down the avg of int2s and int4s, as larger types use a different transition type to handle overflow. Test Plan: ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressAggregates ``` is an existing test for aggregates, including avg. ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgSelect#testAggregatePushdowns ``` checks for aggregate pushdown happening in the right situations. Basic demo of pushdown: ``` yugabyte=# create table aggtest (a int, b int); CREATE TABLE yugabyte=# insert into aggtest values (1, 2), (2, 4), (3, 10); INSERT 0 3 yugabyte=# select avg(a), avg(b) from aggtest; avg | avg --------------------+-------------------- 2.0000000000000000 | 5.3333333333333333 (1 row) yugabyte=# explain (analyze, dist) select avg(a), avg(b) from aggtest; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=105.00..105.02 rows=1 width=64) (actual time=3.811..3.812 rows=1 loops=1) -> Seq Scan on aggtest (cost=0.00..100.00 rows=1000 width=8) (actual time=3.754..3.763 rows=1 loops=1) Partial Aggregate: true Storage Table Read Requests: 1 Storage Table Execution Time: 2.858 ms Planning Time: 0.238 ms Execution Time: 4.412 ms Storage Read Requests: 1 Storage Write Requests: 0 Storage Execution Time: 2.858 ms Peak Memory Usage: 22 kB (11 rows) ``` Reviewers: amartsinchyk Reviewed By: amartsinchyk Differential Revision: https://phabricator.dev.yugabyte.com/D22976
Summary: Avg is more difficult to push down than other aggregates because it takes two values in its transition function--the running count, and the running sum. These two values must be maintained and sent back to postgres separately, as sending back only the average is not enough for postgres to aggregate the results of each tablet. Postgres uses INT8ARRAYOID as the transition type for avg (of smaller ints) because of this. This means that avg cannot be pushed down in a straightforward manner, as DocDB currently cannot send back data of type INT8ARRAYOID. There are several possible solutions, and some discussion of them can be found in the issue: yugabyte#13336 This revision substitutes avg with a count and a sum at the pggate level, both of which can be pushed down, and merges the results back together afterwards. This only pushes down the avg of int2s and int4s, as larger types use a different transition type to handle overflow. Test Plan: ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressAggregates ``` is an existing test for aggregates, including avg. ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgSelect#testAggregatePushdowns ``` checks for aggregate pushdown happening in the right situations. Basic demo of pushdown: ``` yugabyte=# create table aggtest (a int, b int); CREATE TABLE yugabyte=# insert into aggtest values (1, 2), (2, 4), (3, 10); INSERT 0 3 yugabyte=# select avg(a), avg(b) from aggtest; avg | avg --------------------+-------------------- 2.0000000000000000 | 5.3333333333333333 (1 row) yugabyte=# explain (analyze, dist) select avg(a), avg(b) from aggtest; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=105.00..105.02 rows=1 width=64) (actual time=3.811..3.812 rows=1 loops=1) -> Seq Scan on aggtest (cost=0.00..100.00 rows=1000 width=8) (actual time=3.754..3.763 rows=1 loops=1) Partial Aggregate: true Storage Table Read Requests: 1 Storage Table Execution Time: 2.858 ms Planning Time: 0.238 ms Execution Time: 4.412 ms Storage Read Requests: 1 Storage Write Requests: 0 Storage Execution Time: 2.858 ms Peak Memory Usage: 22 kB (11 rows) ``` Reviewers: amartsinchyk Reviewed By: amartsinchyk Differential Revision: https://phabricator.dev.yugabyte.com/D22976
Close by d01a6ed |
Jira Link: DB-2969
Description
The aggregate
avg()
is not pushed down but could be because sum() and count() are.Example on 2.15:
Result:
A workaround is replacing with sum() and avg():
This is aggregated on the tservers:
The text was updated successfully, but these errors were encountered: