Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processing send_join on a large room is extremely inefficient #3495

Open
matrixbot opened this issue Dec 16, 2023 · 0 comments
Open

processing send_join on a large room is extremely inefficient #3495

matrixbot opened this issue Dec 16, 2023 · 0 comments

Comments

@matrixbot
Copy link
Collaborator

matrixbot commented Dec 16, 2023

This issue has been migrated from #3495.


We received a send_join request over federation:

2018-07-09 11:21:46,361 - synapse.access.http.8080 - 144 - INFO - PUT-16391990 - xxx.xxx.xxx.xxx - 8080 - {xxx} Processed request: 648.081sec (43.848sec, 1.580sec) (0.059sec/247.002sec/249) 0B 200 "PUT /_matrix/federation/v1/send_join/%21jOGhaTPuvWFnrWPrzv%3Amatrix.org/%241531134658158ydvgN%3Axxx HTTP/1.1" "Synapse/0.32.0" [239401 dbevts]
2018-07-09 11:22:09,886 - synapse.access.http.8080 - 144 - INFO - PUT-16396650 - xxx.xxx.xxx.xxx - 8080 - {xxx} Processed request: 611.176sec (38.560sec, 1.028sec) (0.042sec/184.576sec/244) 0B 200 "PUT /_matrix/federation/v1/send_join/%21jOGhaTPuvWFnrWPrzv%3Amatrix.org/%241531134658158ydvgN%3Axxx HTTP/1.1" "Synapse/0.32.0" [239401 dbevts]
2018-07-09 11:22:31,112 - synapse.access.http.8080 - 144 - INFO - PUT-16401662 - xxx.xxx.xxx.xxx - 8080 - {xxx} Processed request: 571.488sec (36.464sec, 1.072sec) (0.046sec/125.630sec/244) 0B 200 "PUT /_matrix/federation/v1/send_join/%21jOGhaTPuvWFnrWPrzv%3Amatrix.org/%241531134658158ydvgN%3Axxx HTTP/1.1" "Synapse/0.32.0" [239401 dbevts]
2018-07-09 11:22:53,320 - synapse.access.http.8080 - 144 - INFO - PUT-16406605 - xxx.xxx.xxx.xxx - 8080 - {xxx} Processed request: 531.009sec (34.356sec, 0.712sec) (0.043sec/104.011sec/244) 0B 200 "PUT /_matrix/federation/v1/send_join/%21jOGhaTPuvWFnrWPrzv%3Amatrix.org/%241531134658158ydvgN%3Axxx HTTP/1.1" "Synapse/0.32.0" [239401 dbevts]

While processing these requests, the synapse master stopped logging anything for almost 60 seconds (twice); slave replication stopped, and request processing time went through the roof. Metrics suggest that the CPU was saturated with calls to _get_event_from_row. The 4x239401 (~=950000) events shown in the logs are reflected in the number of calls to _get_event_from_row.

There are several problems here.

Firstly, could we not deduplicate these four requests at the transaction level? They all have the same event id, so we could save ourselves a bunch of effort by deduplicating.

Secondly, why does each request lead to pulling 239401 events out of the database? The total room state in this room is only 120478 events: are we fetching the membership list twice, and if so, why isn't the event cache deduplicating them?

Thirdly, we are presumably requesting the same 239401 events for each of the four requests: can we not deduplicate these?

Finally, and related to the above, when fetching events, we first check the cache, then schedule a db fetch. By the time the db fetch happens, it may be entirely or partially redundant (if other threads have already fetched the relevant events), but we plough ahead anyway.

@matrixbot matrixbot changed the title Dummy issue processing send_join on a large room is extremely inefficient Dec 21, 2023
@matrixbot matrixbot reopened this Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant