-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve performance of event_thread_relation background update for large databases #11375
Comments
As noted in chat, there are various related issues here: #11261, #11260, #11259. A high number of "Heap Fetches" sounds like your https://blog.makandra.com/2018/11/investigating-slow-postgres-index-only-scans/ has quite a lot of detail on this. |
It's also worth noting that this update has been running on
Still, I wonder if it would end up being quicker for us to select all rows in |
Amazing, thank you! I guess the new rows are coming in too fast for our auto VACUUM to keep up.
Still 1.5s~, but speedier than before. |
hrm. It's still scanning the whole of |
|
Good question! I think the next step here is to look at the stats Postgres is using to plan queries on
If you need a quick fix to get your server responding again, you can play around with the constants in
|
#11391 replaces this update with one that will hopefully be more efficient, so I think this will probably solve itself in the next release. |
|
Thanks @bradtgmurray. I've got to say, this has us completely mystified. I've actually tried this on my own homeserver ( For comparison, statistics for that table on
the I also noticed that if I increase the LIMIT high enough (to 20000 or so), Comparing nested loop vs merge join query plan costs on matrix.org:
And on sw1v.org:
The principal difference here appears to be the large "start-up cost" for the merge on I'm sorry not to have better answers on this. It might be worth reaching out on the "pgsql-performance" mailing list (https://www.postgresql.org/list/pgsql-performance/2021-11/) asking if they have any ideas why postgres is favouring a merge join rather than a nested loop, despite the nested loop being much quicker. |
I think we can probably close this out at the this point. The backfill has completed and seems to be some kind of weird difference in database configuration between our DB and matrix.org's, but it's unlikely we'll get to the bottom of it. |
Description
Been debugging some performance issues on our RDS database and came across this ugly chart.

pg_stat_statements say it's this query here:
SELECT event_id, json FROM event_json LEFT JOIN event_relations USING (event_id) WHERE event_id > '$iC2wL1TGv8qPaZXR8-KMSNuKxxgLLkhru3HeUeetEPI' AND event_relations.event_id IS NULL ORDER BY event_id LIMIT 100
Which is the top query on our DB by a factor of 3, as it needs to do a surprisingly slow and large index scan of the index on event_relations to see if there are any that exist for the events we'd like to process.
2 seconds to scan through the whole event_relations_id index looking for entries that don't exist.
Note our database is quite large. Our events_json table is about 145gb alone, and we're only on the $i events in this background update and we already have 7 million rows in our event_relations table.
It feels like the batch size calculation here is actually hurting us: https://github.com/matrix-org/synapse/blob/develop/synapse/storage/background_updates.py#L274
The query takes so long that even though we only want it to take 100ms, it takes 2s+, so we always only ever do the minimum amount (100 entries). However, the nature of the query means that we're scanning the same index over and over again to find pretty small sets of entries.
As a test, running the same query with LIMIT 1000 instead of LIMIT 100 only takes half a second more.
I'm a little scared to bump the minimum batch size across the board, as I'm not sure what impact that will have on different background updates.
Is there a way to do this work in a smarter way that doesn't hit the database so hard?
Should we have a way of tuning batch sizes on a per background update job basis so we avoid badly performing cases like this?
Version information
Homeserver:
matrix.beeper.com
Version:
Lightly modified version of v1.46
Install method:
Rolling out own Docker container
Platform:
Kubernetes on AWS + RDS for the database
The text was updated successfully, but these errors were encountered: