-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregate relation statistics per database #29
Merged
rjuju
merged 2 commits into
powa-team:master
from
CyberDem0n:feature/aggregate-rels-per-db
Mar 30, 2020
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can't reuse the existing type, because we need to count seq and idx scans separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah indeed. If we're doing the distinction between tables and indexes there, wouldn't it be better to do it entirely and have different types/tables_source functions for tables and indexes, for both per-database and global? This would also simplify the UI queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is no way to find the difference between tables/indexes rather than by checking the sum of ins/upd/del/live/etc tuples, postgres is using the same structure to keep statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had the same conclusion and used something similar in the UI (see https://github.com/powa-team/powa-web/blob/master/powa/sql/views.py#L806)
My point is that if we're going to add the table vs index knowledge in powa-archivist rather than the UI, we should store the data in different tables with only the meaningful fields (e.g. no need to save the last vacuum/analyze ts for indexes) and better field names.
I'm not sure if we should split powa_stat_all_rel() in two different C functions, have SQL wrappers to reformat the data of just have the snapshot function deal with that. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually where I copied this logic from :)
Splitting the statistics between tables and indexes is not a big deal. It will also probably save some disk space, because for indexes most of the columns doesn't make sense.
At the same time for tables it would still be nice to keep aggregated statistics of index usage:
sum(idx_scan)
andsum(idx_tup_fetched)
. Without access topg_index
it is not possible :(There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, getting sum(idx_scan) and similar are problematic :( For now it'll have to be performed by the UI after connecting on the target database to resolve the oids and other required fields, until we implement a cache of remote database catalogs.
The full table/index split seems like the right way to go in any case. Are you planning to work on it any further or should I take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to continue working on it, but it seems that I hit the wall, the sum of ins/upd/del/live/etc tuples is not really a good way to distinguish between tables and indexes. Example:
We at least should add there vacuum_count, autovacuum_count, analyze_count, autoanalyze_count, but I think there still would be chances that the relation just has never been vacuumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that unfortunately can happen :(
+1
Unless you read the stats just after a crash/recovery cycles, the only entries which would be misredirected would be mostly unused tables or system catalogs. It's clearly not ideal but probably there won't be much use to process data for such entries. The only use for such would be to try to detect unused indexes, but an additional validation could be done in the UI if needed.
I don't think that there's an ideal solution for this, so as far as I'm concerned I'm with some discrepancy here, which might get fixed later when we add remote cache of system catalogs.