-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
processing send_join on a large room is extremely inefficient #3495
Comments
Would it be better to maintain a queue of events we're trying to fetch, with a record of the deferreds which are waiting for them, and then just process them in lumps of a few hundred at a time? It would provide deduplication and would allow other things to carry on happening while the large fetch takes place. |
semi-related: #3013 |
We also return the full auth chain, which will up the number of events
The event cache size on matrix.org is 200K, so these requests would blow the caches |
this is probably better now? at least it's not wedging the main process? |
this exacerbates https://github.com/matrix-org/matrix-doc/issues/2963 |
This will be mitigated by MSC3706, where servers choose to use it. |
We received a send_join request over federation:
While processing these requests, the synapse master stopped logging anything for almost 60 seconds (twice); slave replication stopped, and request processing time went through the roof. Metrics suggest that the CPU was saturated with calls to
_get_event_from_row
. The 4x239401 (~=950000) events shown in the logs are reflected in the number of calls to_get_event_from_row
.There are several problems here.
Firstly, could we not deduplicate these four requests at the transaction level? They all have the same event id, so we could save ourselves a bunch of effort by deduplicating.
Secondly, why does each request lead to pulling 239401 events out of the database? The total room state in this room is only 120478 events: are we fetching the membership list twice, and if so, why isn't the event cache deduplicating them?
Thirdly, we are presumably requesting the same 239401 events for each of the four requests: can we not deduplicate these?
Finally, and related to the above, when fetching events, we first check the cache, then schedule a db fetch. By the time the db fetch happens, it may be entirely or partially redundant (if other threads have already fetched the relevant events), but we plough ahead anyway.
The text was updated successfully, but these errors were encountered: