[YSQL] avg() is not pushed down where other aggreagates are #13336

FranckPachot · 2022-07-15T22:43:41Z

Jira Link: DB-2969

Description

The aggregate avg() is not pushed down but could be because sum() and count() are.

Example on 2.15:

create table demo as select n from generate_series(1,100000) n union all select null from generate_series(1,100000) ;
select min(n),max(n),avg(n) from demo;
explain (analyze, verbose, summary off) select min(n),max(n),avg(n) from demo;

Result:

yugabyte=> create table demo as select n from generate_series(1,100000) n union all select null from generate_series(1,100000) ;
SELECT 200000
yugabyte=> select min(n),max(n),avg(n) from demo;
 min |  max   |        avg
-----+--------+--------------------
   1 | 100000 | 50000.500000000000
(1 row)

yugabyte=> explain (analyze, verbose, summary off) select min(n),max(n),avg(n) from demo;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=107.50..107.51 rows=1 width=40) (actual time=727.498..727.498 rows=1 loops=1)
   Output: min(n), max(n), avg(n)
   ->  Seq Scan on public.demo  (cost=0.00..100.00 rows=1000 width=4) (actual time=4.368..708.866 rows=200000 loops=1)
         Output: n
(4 rows)

A workaround is replacing with sum() and avg():

select min(n),max(n),sum(n)/count(n)::numeric n from demo;
explain (analyze, verbose, summary off) select min(n),max(n),sum(n)/count(n)::numeric n from demo;

This is aggregated on the tservers:

yugabyte=> select min(n),max(n),sum(n)/count(n)::numeric n from demo;
 min |  max   |         n
-----+--------+--------------------
   1 | 100000 | 50000.500000000000
(1 row)

yugabyte=> explain (analyze, verbose, summary off) select min(n),max(n),sum(n)/count(n)::numeric n from demo;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=110.00..110.02 rows=1 width=40) (actual time=214.047..214.047 rows=1 loops=1)
   Output: min(n), max(n), ((sum(n))::numeric / (count(n))::numeric)
   ->  Seq Scan on public.demo  (cost=0.00..100.00 rows=1000 width=4) (actual time=214.029..214.033 rows=3 loops=1)
         Output: n
         Partial Aggregate: true
(5 rows)

The text was updated successfully, but these errors were encountered:

tedyu · 2022-07-16T03:31:48Z

Currently yb_agg_pushdown_supported() doesn't accept avg for pushdown.
Even if it does,

        /* Aggtranstype is a supported YB key type and is not INTERNAL or NUMERIC. */
        if (!YbDataTypeIsValidForKey(aggref->aggtranstype) ||
            aggref->aggtranstype == INTERNALOID ||
            aggref->aggtranstype == NUMERICOID)
            return;

aggref->aggtranstype for the above SQL is INT8ARRAYOID which would be invalid for key, resulting in yb_agg_pushdown_supported returning aggstate->yb_pushdown_supported as false.

tedyu · 2022-07-16T04:37:47Z

Currently DocExprExecutor uses yb::QLValuePB to store intermediate result (such as partial sum).
yb::QLValuePB represents a single value.

If avg is pushed down, it seems we need to use another protobuf message (Maybe QLSeqValuePB) for storing partial result:
current count and current sum

Though refactoring is needed in PgsqlReadOperation::PopulateResultSet

  QLExprResult result;
  for (const PgsqlExpressionPB& expr : request_.targets()) {

where

using QLExprResult = ExprResult<QLValuePB>;

tedyu · 2022-07-16T15:13:08Z

Another option (without resorting to QLSeqValuePB) is to keep track of partial sum and partial row count in the loop of PgsqlReadOperation::ExecuteScalar using local QLValuePB.
Coming out of the loop, compute the average and store it in result_buffer.

tedyu · 2022-07-16T23:26:11Z

POC showing the idea written above:
push-avg-map.txt

tedyu · 2022-07-17T23:14:45Z

cc @sushantrmishra

myang2021 · 2022-07-18T17:31:51Z

It could be that AVG has different semantics if done by each tablet. If a 3-tablet table has 30 rows, (n1 + n2 + ... + n30)/30 may not be the same as ((n1 + n2 + ... + n10)/10 + (n11 + n12 + ... + n20) / 10 + (n21 + n22 + ... + n30) / 10) / 3, especially when floating point numbers are considered that can involve rounding. In contrast, MIN, COUNT, MAX, SUM do not have such issue. To ensure same result, we may need to compute SUM and COUNT first, and then when coming back to postgres, compute AVG by doing SUM/COUNT. But I am not sure even that can ensure same result as done by PG considering potential SUM overflow case.

myang2021 · 2022-07-18T17:39:29Z

For example, for the given test case, the following yield the same.

yugabyte=# \timing
Timing is on.
yugabyte=# select avg(n) from demo;
 50000.500000000000

Time: 13254.918 ms (00:13.255)
yugabyte=# select sum(n)::numeric/count(n) from demo;
 50000.500000000000

Time: 7538.618 ms (00:07.539)

If we can prove that avg(n) == sum(n)::numeric/count(n) always holds, then we can do a PG parse tree level rewrite to work around the problem that AVG cannot be pushed down.

andrei-mart · 2022-07-18T19:46:12Z

((n1 + n2 + ... + n10)/10 + (n11 + n12 + ... + n20) / 10 + (n21 + n22 + ... + n30) / 10) / 3

In that particular example avg is correct, as ((n1+...+n10)/10+(n11+...+n20)/10+(n21+...+n30)/10)/3 == (n1+...+n30)/30, however, in general case of uneven distribution it is not, as ((n1+...+n9)/9+(n10+...+n20)/11+(n21+...+n30)/10)/3 != (n1+...+n30)/30

That means sums and counts must be kept separate and be combined by summing of separate components until very end where one is divided by other: ((n1+...+n9)+(n10+...+n20)+(n21+...+n30)/(9+11+10)) == (n1+...+n30)/30

yugabyte/yugabyte-db#13336

tedyu · 2022-08-28T13:35:51Z

It seems we can modify src/postgres/src/backend/parser/gram.y by specializing avg with sum over count.
This would achieve pushdown with minimal code changes.

If this proposal is adopted, I can send out a patch.

Summary: Avg is more difficult to push down than other aggregates because it takes two values in its transition function--the running count, and the running sum. These two values must be maintained and sent back to postgres separately, as sending back only the average is not enough for postgres to aggregate the results of each tablet. Postgres uses INT8ARRAYOID as the transition type for avg (of smaller ints) because of this. This means that avg cannot be pushed down in a straightforward manner, as DocDB currently cannot send back data of type INT8ARRAYOID. There are several possible solutions, and some discussion of them can be found in the issue: #13336 This revision substitutes avg with a count and a sum at the pggate level, both of which can be pushed down, and merges the results back together afterwards. This only pushes down the avg of int2s and int4s, as larger types use a different transition type to handle overflow. Test Plan: ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressAggregates ``` is an existing test for aggregates, including avg. ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgSelect#testAggregatePushdowns ``` checks for aggregate pushdown happening in the right situations. Basic demo of pushdown: ``` yugabyte=# create table aggtest (a int, b int); CREATE TABLE yugabyte=# insert into aggtest values (1, 2), (2, 4), (3, 10); INSERT 0 3 yugabyte=# select avg(a), avg(b) from aggtest; avg | avg --------------------+-------------------- 2.0000000000000000 | 5.3333333333333333 (1 row) yugabyte=# explain (analyze, dist) select avg(a), avg(b) from aggtest; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=105.00..105.02 rows=1 width=64) (actual time=3.811..3.812 rows=1 loops=1) -> Seq Scan on aggtest (cost=0.00..100.00 rows=1000 width=8) (actual time=3.754..3.763 rows=1 loops=1) Partial Aggregate: true Storage Table Read Requests: 1 Storage Table Execution Time: 2.858 ms Planning Time: 0.238 ms Execution Time: 4.412 ms Storage Read Requests: 1 Storage Write Requests: 0 Storage Execution Time: 2.858 ms Peak Memory Usage: 22 kB (11 rows) ``` Reviewers: amartsinchyk Reviewed By: amartsinchyk Differential Revision: https://phabricator.dev.yugabyte.com/D22976

Summary: Avg is more difficult to push down than other aggregates because it takes two values in its transition function--the running count, and the running sum. These two values must be maintained and sent back to postgres separately, as sending back only the average is not enough for postgres to aggregate the results of each tablet. Postgres uses INT8ARRAYOID as the transition type for avg (of smaller ints) because of this. This means that avg cannot be pushed down in a straightforward manner, as DocDB currently cannot send back data of type INT8ARRAYOID. There are several possible solutions, and some discussion of them can be found in the issue: yugabyte#13336 This revision substitutes avg with a count and a sum at the pggate level, both of which can be pushed down, and merges the results back together afterwards. This only pushes down the avg of int2s and int4s, as larger types use a different transition type to handle overflow. Test Plan: ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressAggregates ``` is an existing test for aggregates, including avg. ``` ./yb_build.sh --java-test org.yb.pgsql.TestPgSelect#testAggregatePushdowns ``` checks for aggregate pushdown happening in the right situations. Basic demo of pushdown: ``` yugabyte=# create table aggtest (a int, b int); CREATE TABLE yugabyte=# insert into aggtest values (1, 2), (2, 4), (3, 10); INSERT 0 3 yugabyte=# select avg(a), avg(b) from aggtest; avg | avg --------------------+-------------------- 2.0000000000000000 | 5.3333333333333333 (1 row) yugabyte=# explain (analyze, dist) select avg(a), avg(b) from aggtest; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Finalize Aggregate (cost=105.00..105.02 rows=1 width=64) (actual time=3.811..3.812 rows=1 loops=1) -> Seq Scan on aggtest (cost=0.00..100.00 rows=1000 width=8) (actual time=3.754..3.763 rows=1 loops=1) Partial Aggregate: true Storage Table Read Requests: 1 Storage Table Execution Time: 2.858 ms Planning Time: 0.238 ms Execution Time: 4.412 ms Storage Read Requests: 1 Storage Write Requests: 0 Storage Execution Time: 2.858 ms Peak Memory Usage: 22 kB (11 rows) ``` Reviewers: amartsinchyk Reviewed By: amartsinchyk Differential Revision: https://phabricator.dev.yugabyte.com/D22976

jasonyb · 2023-05-25T02:36:01Z

Close by d01a6ed

FranckPachot added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jul 15, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 15, 2022

FranckPachot mentioned this issue Jul 15, 2022

Analyze/improve query performance for Yugabyte MaibornWolff/database-performance-comparison#7

Open

yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jul 20, 2022

FranckPachot added a commit to FranckPachot/ClickBench that referenced this issue Jul 29, 2022

changed AVG to SUM/COUNT

b5dd748

yugabyte/yugabyte-db#13336

yugabyte-ci assigned yzhong-yb Feb 6, 2023

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Feb 6, 2023

jasonyb closed this as completed May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] avg() is not pushed down where other aggreagates are #13336

[YSQL] avg() is not pushed down where other aggreagates are #13336

FranckPachot commented Jul 15, 2022 •

edited by yugabyte-ci

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 17, 2022

myang2021 commented Jul 18, 2022

myang2021 commented Jul 18, 2022 •

edited

Loading

andrei-mart commented Jul 18, 2022

tedyu commented Aug 28, 2022

jasonyb commented May 25, 2023

[YSQL] avg() is not pushed down where other aggreagates are #13336

[YSQL] avg() is not pushed down where other aggreagates are #13336

Comments

FranckPachot commented Jul 15, 2022 • edited by yugabyte-ci Loading

Description

tedyu commented Jul 16, 2022 • edited Loading

tedyu commented Jul 16, 2022 • edited Loading

tedyu commented Jul 16, 2022 • edited Loading

tedyu commented Jul 16, 2022 • edited Loading

tedyu commented Jul 17, 2022

myang2021 commented Jul 18, 2022

myang2021 commented Jul 18, 2022 • edited Loading

andrei-mart commented Jul 18, 2022

tedyu commented Aug 28, 2022

jasonyb commented May 25, 2023

FranckPachot commented Jul 15, 2022 •

edited by yugabyte-ci

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

tedyu commented Jul 16, 2022 •

edited

Loading

myang2021 commented Jul 18, 2022 •

edited

Loading