-
-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug/performance] Slow Postgres COUNT(*)
queries
#1887
Comments
I would have thought PostgreSQL would still be quite fast for that query and row count. My instance uses sqlite, but as a test I copied the database to PostgreSQL to try the same type of query. It was quick - less than 1ms.
(I've obfuscated the actual account_id.. but I picked one with a lot of statuses. My "statuses" table has 512k rows) Are you able to connect to your database with psql and run the query with "explain analyze"? |
Unfortunately it's part of the Mastodon API so we don't really have much of a choice, since some clients do display it... Personally I'd prefer to only show a 'detailed' view of that account when the account itself is viewed specifically (rather than just the account model that's pinned to statuses in timelines etc), but diverging from the mastodon API like that will doubtless break client expectations in ways we can't predict. We could look at caching those account stats somewhere and invalidating them occasionally so that they remain somewhat up to date (@NyaaaWhatsUpDoc what do you think?), but you'd still have the same issue with slow queries from a cold cache. Out of interest, which version of Postgres are you using? And which version of GtS? |
Yeah, grml… ok. But please put in an estimate instead. The linked document explains both why
PostgreSQL 13.11 (Debian bullseye), GtS 0.9.0 git-282be6f
Sure.
This is of course faster than the first time it was attempted, by GtS itself just seconds before:
Of course now I’m not able to easily produce a 3000+ ms query in |
Oh hmm. Estimating with a filter (on account) is ofc not as easily possible. I’d settle for a hack: a config option that makes it just return 0 posts for everyone. Can GtS distinguish between clients enumerating my followers/followees vs. visiting an individual account’s profile? (Probably not if I know my luck.) Or maybe cache an estimate count with a profile that’s only updated every once in a while (maybe daily)… (unsure if that would be better… do rows in the |
OK, here’s it now in slow:
Hrrrm. It does use the index, but it’s still slow given it has to recheck whether the rows are actually visible. And I guess the 1 GiB RAM VM, even if it almost only does GtS, is reaching its limits. Estimating is probably not possible. Caching the number… unsure. Reporting just 0 toots for everyone as a workaround is still something I’d use on my instance, even if it’s technically wrong, because the numbers usually are wrong anyway. |
Ah thanks for catching it again! I still feel like 1GiB should be more than enough, so I reckon we've gotta do some creative thinking about performance improvements for this one. |
tobi dixit:
Ah thanks for catching it again! I still feel like 1GiB should be more
than enough, so I reckon we've gotta do some creative thinking about
performance improvements for this one.
Aye. I was thinking up a few (see earlier posts) plus a hack allowing
GtS to just lie about it, as a config option.
The problem with COUNT(*) is that it still has to do a table scan
after the index scan, to see if the rows returned by the index are
idd still alive to the current transaction. (This even came up
recently on a PostgreSQL mailing list.) If the rows are very spread
across the status table this means a lot of I/O and cache thrashing.
bye,
//mirabilos
--
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
|
having the same issue but when requesting v1/timelines/home and v1/notifications. ive got 988k rows in my statuses table for a single user instance running for a year and a half. 30-40 seconds to pull notifications obviously times out and makes the server unusable. DB is on spinning rust but that shouldnt cause this much of an issue. relevant logs:
and for good measure i explain analyzed one of the account ids that has shown up repeatedly:
the queries do get faster if theyve just been run but this postgres db does enough other stuff that it usually doesnt cache for very long. occasionally if the rest of the server settles i can manage to grab notifications on the second or third try,,,which is still a poor experience. i could fiddle with the database to try to speed this up but that seems like a fools errand if this is a case of postgres being asked to do something it isnt designed to do. |
It's berserk to me that postgres struggles so hard with this when sqlite handles it no problem. Very frustrating. As stated above, we can't use the estimate function because we need to add a WHERE clause to select the appropriate account ID. And besides, the estimate function apparently relies on ANALYZE or VACUUM having been run recently, which we can't guarantee since some db admins might have it turned off for whatever reason. And having a config option that just ignores counts is hacky and bad, I don't think that's worth considering. I'll have another look around when time permits and see if there's another approach that would work. Other sensible suggestions welcome in the meantime. |
Ezra Barrow dixit:
Execution Time: 19804.337 ms
Huh, that’s impressive.
I assume you VACUUM ANALYZE regularily.
What saved my instance was a move to a differently spec’d VM.
It’s got 4 GiB RAM instead of just one, which I assume makes
all the difference, but it’s also got SSD-backed storage in‐
stead of a rather slow NAS, and on the original one, there
was too much background activity after all (including an I/O-
heavy daily cronjob that tar’d up gigabytes).
Not sure if that’s a possibility for you.
From an application developer side, I’d probably put the
counts into a table by themselves and either update them
along or, preferably, update them only occasionally and
serve “incorrect” (too low) counts. (Then save also the
max ID of the last update, so the next one needs only
count rows added since, and (to account for deletions)
only do full rescans very occasionally or as an idle task
when there is currently no activity.) That needs quite some
dev effort, of course… so I can understand not getting this
done any time soon. Just faking the numbers, as a config
option for instances where users are ok with just getting 0,
is probably easier to do.
bye,
//mirabilos
|
tobi dixit:
It's berserk to me that postgres struggles so hard with this when
sqlite handles it no problem. Very frustrating.
Different base design choices.
The problem here is that when you ask PostgreSQL for an exact
number, it goes through all the pain to give you that exact
number in precise and correct. This means not only scanning in
the index, but then also checking in the pages on disc whether
the tuples are indeed still alive, as a different transaction
from another session could have committed a deletion recently.
That’s all from being a proper database server system with MVCC
and a historical focus on correctness over even speed.
SQLite does not even have a concept of multiple session, or
clients in general, because it runs embedded into one process.
(I’ve seen webapp devs using it, becoming curious about long
wait times, which turned out to be waiting until another runner
unlocked (by closing) the database file.)
If you could keep these counts as a simple numeric field
associated with the relevant user, that would probably work
best. (While not an experienced DBA, I’d be leaning towards
putting it into a different table than accounts because that
data changes vastly more often than the fields in the accounts
table.)
bye,
//mirabilos
|
Multiple processes can open an sqlite db file at the same time, and indeed this is what we do in GtS with WAL mode and multiple open connections. https://www.sqlite.org/faq.html#q5 But anyway that's beside the point of this issue. @barrowsys Perhaps you could give more information about your setup? Is the database running on a separate machine from GtS, for instance? Is it possible that the Postgres in-memory caches are getting put into swap memory? |
postgres and gts are on the same machine, along with lots of other db clients. its a mildly busy box, the caches likely get cleared by nature of that. we do have plenty of memory free, we only use a portion of the 16G for applications, the rest the OS does as it wishes with. it seems silly to need a dedicated postgres install for a single user instance, especially when the only reason is a field rarely used by clients. another potential solution is to provide the post counts on a best effort basis? add a timeout to the db queries and return 0 or -1 if it doesnt return in time, so the user still knows theres an error but doesnt timeout their basic requests.
nothing beyond the default postgres configuration (which should autovacuum?). just a big dedicated server running nowhere near capacity. trying a vacuum analyze right now, will report back if that changes anything. |
Mmm, okay that all sounds fine indeed. That's similar to the gts.superseriousbusiness.org setup, which also has a bunch of stuff running on it (including other db clients). And the drive itself? Is that on an ssd? And what's the swappiness of the machine? Normally for a database server / server running a database on it you'll want a swappiness of 10 or less. We've seen in the past that people running with default swappiness of some OS's get their database performance tanked as the OS swaps database caches from memory into swap file, which is much, much slower. Just to clarify btw, I'm not saying we don't also have work to do to tune these queries and stuff, just looking to see if maybe there's a bottleneck on the deployment that can also be eased in the meantime. |
drive is two HDDs in raid 0. swappiness is 10. HDDs are somewhat old but we're still a few months out from being able to replace them. running a there are still several related log warnings from the last 12 hours but the frequency and severity of them is reduced. still over a second for these single queries, but this is every warning in the logs from that time period rather than a small selection.
thanks for the clarification, sorry for my hostility. |
Ezra Barrow dixit:
running a `VACUUM ANALYZE statuses;` does seem to have alleviated the
issue enough to be usable. timeline and notification requests generally
return in reasonable time now. no clue whether it was the vacuum or the
analyze that did it
Hmm hmm.
, but iiuc the default autovacuum settings should be keeping the table
I’m not sure how many people trust the autovacuum though.
vacuumed, so presumably the analyze? even a straight `count(*) FROM
statuses` completes in 800ms when not recently cached.
Maybe the compaction.
AIUI it is that the index tells it which rows to consider, and then
it needs to check the rows themselves to see whether they are still
alive, so if they are packed together more tightly, it’s a lesser
number of pages to check.
Which brings me to another idea. Have a table with just an FK on
accounts and an FK on statuses. (Unfortunately these can’t be BIGNUM
because GtS uses strings as PKs, otherwise, the effect would be even
better.) Then do the queries on that.
But that would be even more effort to keep in sync than storing a
per-account number or two. OK, so forget it…
Meow,
//mirabilos
|
the PR i pushed should help, but beyond a certain point it feels like we're just providing workarounds for suboptimal hosting conditions for a gotosocial postgres setup. due to the nature of it being out-of-process, and being optimized more for scaling up / horizontally (as evident by how it does count queries), and with how important database latency is to gotosocial, postgres installations will naturally require more cpu grunt and / or more connection parameter tuning to get it working optimally. |
Ah it's cool that the query times are reduced even on HDDs with the ANALYZE :) What we found with SQLite recently is that unless you do regular ANALYZEs, then the query planner sometimes struggles to understand the best way to perform a certain query, because it simply doesn't have enough information to form a good picture of the table. We've now implemented regular ANALYZE calls in SQLite to alleviate this. But of course that doesn't solve the Postgres-specific problem; running ANALYZE regularly on Postgres is something that DB operators need to schedule themselves, I believe. (Can't remember where we talked about this last, but iirc that's the conclusion we came to.) So yes anyway, I'm with @NyaaaWhatsUpDoc in the sense that at some point we can't really do anything else but implement workarounds. Inspired by this issue we have a pull request open now that should alleviate some of these problems: #2620 But beyond that, the only sensible step (as @mirabilos suggested) would be to rearchitect the database schemas in order to store counts and stuff in an |
kim dixit:
the PR i pushed should help, but beyond a certain point it feels like
we're just providing workarounds for suboptimal hosting conditions for
a gotosocial postgres setup. due to the nature of it being
Yes and no.
out-of-process, and being optimized more for scaling up / horizontally
(as evident by how it does count queries), and with how important
database latency is to gotosocial, postgres installations will
naturally require more cpu grunt and / or more connection parameter
tuning to get it working optimally.
Developers always have to program to the database up to a certain
amount. You couldn’t use this schema with, say, a column-based DB
or a document store either. So accounting that COUNT is naturally
slow on PostgreSQL in the order of size that GtS has needs to be
done, ideally by keeping the counts as data field, then perhaps
lazily updating that.
|
@mirabilos I believe that's the second or third time in this thread now that you've suggested the same thing. The point has been taken already. |
tobi dixit:
But of course that doesn't necessarily solve the Postgres problem;
running ANALYZE regularly on Postgres is something that DB operators
need to schedule themselves, I believe.
I think so as well. I even do FREEZE vacuums so the blocks get packed
better (though I stopped regularily doing FULL as that locks the tables
for long and is not necessary so often):
$ cat /etc/cron.d/zzzlocal
2 7 * * * postgres /usr/bin/env LANG=C LC_CTYPE=C LC_ALL=C psql -c 'VACUUM FREEZE ANALYZE;' gotosocial 2>&1 | logger -t gtscron-vacuum
That seems like a decent approach, but it's not a priority
*immediately*.
Fully agreed.
In the longer term, several storage-related cleanups would be
welcome, as we’ll hit scaling issues eventually, but for now,
and with the duplicate indexes having been removed, things can
work, just not with more than one of few RAM and concurrent I/O,
and not with DB on slow storage (you’ve already added a warning
about that Hetzner thing, but some VM providers have naturally
slower backing storage for the VM HDDs than others).
|
after giving it some more time the issue has managed to come up again even with the |
COUNT(*)
queries
Ah that's a pity. I'm not sure what else could be done at this point, though this will be alleviated somewhat in 0.14.0 with the PR mentioned above. Anyway, I'll make a separate issue (tomorrow, probably) to remind us to -- at some point -- move account stats into a separate table, and will link back to this issue there. |
My instance is now up for over a year, but some things have become really slow. For example, the Pinafore frontend times out 3–5 times trying to load my followers (~160) or followees (~200). These classes of slowness seem to be database-related.
I notice, from the GtS log, that you use many
COUNT(*)
calls, which, on PostgreSQL, must be slow due to its data consistency guarantees.Looking at the slow queries, many of them are…
… which seems to list the number of posts done by each account.
Which Pinafore doesn’t even display.
Which GtS cannot even get because their old toots from before I followed them aren’t backfilled.
So please just use one of the estimate alternatives for that. The number will be slightly off, but not by much, and given the above I doubt anyone would complain overly seriously.
(My
statuses
table is 375k rows long, and indicēs only help somewhat far but not for this.)The text was updated successfully, but these errors were encountered: