Aggregate relation statistics per database #29

CyberDem0n · 2020-01-16T12:20:53Z

The change will help to make base_query_all_rels_sample() more efficient for clusters with a lot of relations.

CyberDem0n · 2020-01-16T12:22:22Z

powa--4.0.0beta1.sql

@@ -464,6 +464,153 @@ CREATE OPERATOR / (
 );
 /* end of pg_stat_all_relations operator support */

+CREATE TYPE powa_all_relations_history_db_record AS (


Unfortunately I can't reuse the existing type, because we need to count seq and idx scans separately.

ah indeed. If we're doing the distinction between tables and indexes there, wouldn't it be better to do it entirely and have different types/tables_source functions for tables and indexes, for both per-database and global? This would also simplify the UI queries.

It seems there is no way to find the difference between tables/indexes rather than by checking the sum of ins/upd/del/live/etc tuples, postgres is using the same structure to keep statistics.

Yes, I had the same conclusion and used something similar in the UI (see https://github.com/powa-team/powa-web/blob/master/powa/sql/views.py#L806)

My point is that if we're going to add the table vs index knowledge in powa-archivist rather than the UI, we should store the data in different tables with only the meaningful fields (e.g. no need to save the last vacuum/analyze ts for indexes) and better field names.

I'm not sure if we should split powa_stat_all_rel() in two different C functions, have SQL wrappers to reformat the data of just have the snapshot function deal with that. Any thoughts?

used something similar in the UI

That's actually where I copied this logic from :)

Splitting the statistics between tables and indexes is not a big deal. It will also probably save some disk space, because for indexes most of the columns doesn't make sense.
At the same time for tables it would still be nice to keep aggregated statistics of index usage: sum(idx_scan) and sum(idx_tup_fetched). Without access to pg_index it is not possible :(

Yes, getting sum(idx_scan) and similar are problematic :( For now it'll have to be performed by the UI after connecting on the target database to resolve the oids and other required fields, until we implement a cache of remote database catalogs.

The full table/index split seems like the right way to go in any case. Are you planning to work on it any further or should I take care of it?

I would like to continue working on it, but it seems that I hit the wall, the sum of ins/upd/del/live/etc tuples is not really a good way to distinguish between tables and indexes. Example:

select relname, seq_scan, idx_scan, last_autovacuum from pg_stat_all_tables where (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd + n_live_tup + n_dead_tup + n_mod_since_analyze) = 0; relname | seq_scan | idx_scan | last_autovacuum --------------------------------------------+----------+----------+------------------------------- pg_default_acl | 2 | 18 | 2020-01-27 21:34:50.623717+00 pg_foreign_server | 2 | 0 | 2020-01-27 21:34:50.622659+00 pg_user_mapping | 0 | 0 | 2020-01-27 21:34:50.385682+00 pg_toast_2620 | 0 | 0 | 2020-01-27 21:34:50.678516+00 ...

We at least should add there vacuum_count, autovacuum_count, analyze_count, autoanalyze_count, but I think there still would be chances that the relation just has never been vacuumed.

yes, that unfortunately can happen :(

We at least should add there vacuum_count, autovacuum_count, analyze_count, autoanalyze_count

+1

but I think there still would be chances that the relation just has never been vacuumed.

Unless you read the stats just after a crash/recovery cycles, the only entries which would be misredirected would be mostly unused tables or system catalogs. It's clearly not ideal but probably there won't be much use to process data for such entries. The only use for such would be to try to detect unused indexes, but an additional validation could be done in the UI if needed.

I don't think that there's an ideal solution for this, so as far as I'm concerned I'm with some discrepancy here, which might get fixed later when we add remote cache of system catalogs.

rjuju · 2020-03-29T19:46:43Z

Hello @CyberDem0n

I'm wondering whether it's worth spending effort for that in v4.0.

How about merging this PR as-is in v4.0 if you still need it, and spend instead effort in v4.1 to drop entirely the powa_all_relation/powa_user_function thing and instead allow the collector to connect to other databases, and retrieve all the information that would be needed?

We'll have to go there anyway, so there's probably no need to delay v4 any further.

CyberDem0n · 2020-03-30T06:57:18Z

Ok, I updated the query which aggregates statistics per database. This is probably the best we can do right now without having access to pg_index.

rjuju · 2020-03-30T07:43:09Z

Thanks a lot, merged!

I'll try to work on the UI part to use those new tables quickly so we can finally release v4 and start working on the 4.1 wish list. FTR a new powa-archivist-git docker image should be pushed soon.

Also, feel free to add additional items for the V4.1 scope if you think about more data to aggregate

rjuju · 2020-03-30T10:06:11Z

@CyberDem0n I just pushed powa-team/powa-web@f48d4aa which should use this new infrastructure. Let me know if that fixes your performance issue if you have a way to test that easily.

Aggregate relation statistics per database

02096ff

The change will help to make base_query_all_rels_sample() more efficient for clusters with a lot of relations.

CyberDem0n commented Jan 16, 2020

View reviewed changes

rjuju added the enhancement label Mar 29, 2020

rjuju added this to the v4.0 milestone Mar 29, 2020

Improve reliability of distinction between tables and indexes

4fbda7e

rjuju merged commit ce0e252 into powa-team:master Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate relation statistics per database #29

Aggregate relation statistics per database #29

CyberDem0n commented Jan 16, 2020

CyberDem0n Jan 16, 2020

rjuju Jan 16, 2020

CyberDem0n Jan 16, 2020

rjuju Jan 17, 2020

CyberDem0n Jan 22, 2020

rjuju Jan 22, 2020

CyberDem0n Jan 28, 2020

rjuju Jan 28, 2020

rjuju commented Mar 29, 2020

CyberDem0n commented Mar 30, 2020

rjuju commented Mar 30, 2020

rjuju commented Mar 30, 2020

Aggregate relation statistics per database #29

Aggregate relation statistics per database #29

Conversation

CyberDem0n commented Jan 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjuju commented Mar 29, 2020

CyberDem0n commented Mar 30, 2020

rjuju commented Mar 30, 2020

rjuju commented Mar 30, 2020