powa_kcache_metrics_current grows enormously #42

banlex73 · 2021-03-17T21:45:07Z

No description provided.

banlex73 · 2021-03-17T22:01:01Z

What I found on my powa repository DB
powa_kcache_metrics_current was as big as 133Gb
and SELECT powa_kcache_aggregate(13) threw an error:
ERROR: duplicate key value violates unique constraint "powa_kcache_metrics_pkey" Detail: Key (srvid, coalesce_range, queryid, dbid, userid, top)=(13, ["2021-03-16 12:19:31.162797-07","2021-03-16 16:46:16.570921-07"], -9206572327289564196, 862035, 861727, t) already exists.
PL/pgSQL function powa_kcache_aggregate(integer) line 12 at SQL statement
and my fix was adding ON CONFLICT DO NOTHING;
I don't know if it is acceptable solution

`CREATE OR REPLACE FUNCTION public.powa_kcache_aggregate(_srvid integer)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
result bool;
v_funcname text := 'powa_kcache_aggregate(' || _srvid || ')';
v_rowcount bigint;
BEGIN
PERFORM powa_log(format('running %I', v_funcname));

PERFORM powa_prevent_concurrent_snapshot(_srvid);

-- aggregate metrics table
INSERT INTO public.powa_kcache_metrics (coalesce_range, srvid, queryid,
                                        top, dbid, userid, metrics,
                                        mins_in_range, maxs_in_range)
    SELECT tstzrange(min((metrics).ts), max((metrics).ts),'[]'),
    srvid, queryid, top, dbid, userid, array_agg(metrics),
    ROW(min((metrics).ts),
        min((metrics).plan_reads), min((metrics).plan_writes),
        min((metrics).plan_user_time), min((metrics).plan_system_time),
        min((metrics).plan_minflts), min((metrics).plan_majflts),
        min((metrics).plan_nswaps),
        min((metrics).plan_msgsnds), min((metrics).plan_msgrcvs),
        min((metrics).plan_nsignals),
        min((metrics).plan_nvcsws), min((metrics).plan_nivcsws),
        min((metrics).exec_reads), min((metrics).exec_writes),
        min((metrics).exec_user_time), min((metrics).exec_system_time),
        min((metrics).exec_minflts), min((metrics).exec_majflts),
        min((metrics).exec_nswaps),
        min((metrics).exec_msgsnds), min((metrics).exec_msgrcvs),
        min((metrics).exec_nsignals),
        min((metrics).exec_nvcsws), min((metrics).exec_nivcsws)
    )::powa_kcache_type,
    ROW(max((metrics).ts),
        max((metrics).plan_reads), max((metrics).plan_writes),
        max((metrics).plan_user_time), max((metrics).plan_system_time),
        max((metrics).plan_minflts), max((metrics).plan_majflts),
        max((metrics).plan_nswaps),
        max((metrics).plan_msgsnds), max((metrics).plan_msgrcvs),
        max((metrics).plan_nsignals),
        max((metrics).plan_nvcsws), max((metrics).plan_nivcsws),
        max((metrics).exec_reads), max((metrics).exec_writes),
        max((metrics).exec_user_time), max((metrics).exec_system_time),
        max((metrics).exec_minflts), max((metrics).exec_majflts),
        max((metrics).exec_nswaps),
        max((metrics).exec_msgsnds), max((metrics).exec_msgrcvs),
        max((metrics).exec_nsignals),
        max((metrics).exec_nvcsws), max((metrics).exec_nivcsws)
    )::powa_kcache_type
    FROM powa_kcache_metrics_current
    WHERE srvid = _srvid
    GROUP BY srvid, queryid, top, dbid, userid
 **ON CONFLICT DO NOTHING;**

`

rjuju · 2021-03-18T01:34:34Z

Hi,

Do you have some monitoring to see how is the growth factor like? If it's a constant and regular growth then it's likely due to inadequate autovacuum tuning.

For the ON CONFLICT, it's a workaround for a bug. So sure for now you can keep it as is, but it's a sign that there's something wrong going on here, and the root problem should be fixed instead.

That being said I have no clue how that could be happening. It seems that the function is aggregating the same data twice, but that shouldn't be possible because the function removes the data it just aggregated. As the function is atomic, you should either have both aggregated the data and remove the source OR done nothing.

Ideally what should be done is removing the ON CONFLICT cause, wait to see if the error happens again and if it does check if the problematic record exists in powa_kcache_metrics table.

banlex73 · 2021-03-18T05:03:30Z

Unfortunately, I don't have any monitoring in place..
I monitor 6 clusters (253 DBs), retention policy looks like:
DBs Retention
4 3 days
4 7 days
245 2 days
Table was bloated, after vacuum full it became: 12 GB (133Gb before)
I use default auto vacuum setting on postgres and nothing special on that table.
To find the root cause, I am going to remove ON CONFLICT and start monitoring it carefully, to understand what causes the issue.

Thank you for support.

banlex73 · 2021-03-18T18:44:30Z

Found what is wrong with auto vacuum on my powa repository DB - I have powa_take_snapshot constantly running. auto vacuum doesn't have enough time to complete between snapshots.

rjuju · 2021-03-19T03:27:58Z

What does those values actually mean?

But even if there's always one or multiple powa_take_snapshot() running, it shouldn't prevent autovacuum from working. It only means that you'll have an amount of bloat equal to the maximum number of snapshot being performed between two autovacuum runs. It can be a bit high, but it should stay relatively constant. And I don't think that it should represent 120GB.

Note also that if you have pg_qualstats setup and a lot of databases and/or users and/or different normalized queries, the aggregation can be quite expensive. We have a fix for that (682b753) but it's not released yet. If you have pg_stat_statements enabled (and maybe a local powa) on your repository server you could check if you could benefit from this patch.

banlex73 · 2021-03-19T04:01:46Z

Thank you. My plan is: - try that patch (I have pg_stat_statements enabled and one of my cluster has ~200+ Dbs) - monitor database size чт, 18 бер. 2021 о 20:28 Julien Rouhaud ***@***.***> пише:

…

What does those values actually mean? But even if there's always one or multiple powa_take_snapshot() running, it shouldn't prevent autovacuum from working. It only means that you'll have an amount of bloat equal to the maximum number of snapshot being performed between two autovacuum runs. It can be a bit high, but it should stay relatively constant. And I don't think that it should represent 120GB. Note also that if you have pg_qualstats setup and a lot of databases and/or users and/or different normalized queries, the aggregation can be quite expensive. We have a fix for that (682b753 <682b753>) but it's not released yet. If you have pg_stat_statements enabled (and maybe a local powa) on your repository server you could check if you could benefit from this patch. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#42 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIHWEYGGGO77SWBIATN6ZT3TELAE3ANCNFSM4ZLNAHSA> .

rjuju · 2022-08-31T09:09:37Z

Hi @banlex73, any news on this issue?

banlex73 · 2022-08-31T18:43:38Z

It was ok but let me recheck it once I'm back from vacation

…

On Wed, 31 Aug 2022, 02:09 Julien Rouhaud, ***@***.***> wrote: Hi @banlex73 <https://github.com/banlex73>, any news on this issue? — Reply to this email directly, view it on GitHub <#42 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIHWEYCQJ6DTJML4QFCMIULV34ONXANCNFSM4ZLNAHSA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

rjuju · 2022-09-01T05:54:22Z

Ok, thanks!

banlex73 · 2022-09-30T23:52:35Z

All good now, thank you

banlex73 · 2022-09-30T23:52:45Z

All good now, thank you

rjuju · 2022-10-01T03:56:13Z

Thanks a lot for the confirmation!

rjuju self-assigned this Mar 18, 2021

banlex73 changed the title ~~powa_kcache_metrics_current grows enorm~~ powa_kcache_metrics_current grows enormously Mar 18, 2021

banlex73 closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

powa_kcache_metrics_current grows enormously #42

powa_kcache_metrics_current grows enormously #42

banlex73 commented Mar 17, 2021

banlex73 commented Mar 17, 2021

rjuju commented Mar 18, 2021

banlex73 commented Mar 18, 2021

banlex73 commented Mar 18, 2021 •

edited

Loading

rjuju commented Mar 19, 2021

banlex73 commented Mar 19, 2021 via email

rjuju commented Aug 31, 2022

banlex73 commented Aug 31, 2022 via email

rjuju commented Sep 1, 2022

banlex73 commented Sep 30, 2022

banlex73 commented Sep 30, 2022

rjuju commented Oct 1, 2022

powa_kcache_metrics_current grows enormously #42

powa_kcache_metrics_current grows enormously #42

Comments

banlex73 commented Mar 17, 2021

banlex73 commented Mar 17, 2021

rjuju commented Mar 18, 2021

banlex73 commented Mar 18, 2021

banlex73 commented Mar 18, 2021 • edited Loading

rjuju commented Mar 19, 2021

banlex73 commented Mar 19, 2021 via email

rjuju commented Aug 31, 2022

banlex73 commented Aug 31, 2022 via email

rjuju commented Sep 1, 2022

banlex73 commented Sep 30, 2022

banlex73 commented Sep 30, 2022

rjuju commented Oct 1, 2022

banlex73 commented Mar 18, 2021 •

edited

Loading