-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Unable to join some large rooms due to high RAM consumption #7339
Comments
The "size" of a Matrix room isn't described by its number of users but the number of state events (e.g. joins, leaves, kicks, bans, changes of name, topic, power level rules, join rules, etc.) in its history. To summarise, there is a component to Matrix called the state resolution algorithm that's in charge of resolving clashes between two servers that got out of sync regarding what state a given room currently is. This algorithm works through the whole state of the room, and needs to load most (if not all) state events in that room in memory to work. This is what's making Synapse so hungry on RAM when trying to join a large room, because it needs to retrieve and authenticate every state event, which can be expensive for old rooms. If you're interested, how exactly this algorithm works has been explained recently on the matrix.org website: https://matrix.org/docs/guides/implementing-stateres IIRC this is also the reason why some rooms can't be joined from small homeservers on modular.im. The above is more a point of context and details than "it has a reason so it's not an issue" (because it definitely is an issue), and I don't think there's an open issue about that on this repo so I'll keep that one open to track the status of this. |
Every algorithm can be implemented using few RAM, but then maybe requiring more I/O to persistent storage (such as a DB) and being slower. This is a tradeoff decision. The current implementation decisions exclude users of cheap hardware (for home servers) to join larger rooms. IMO this is a bug, isn’t it? If the algorithm implementation would be tied more to the DB and the DB would appropriately implement caching, the memory usage behavior would probably be automatically more adaptive to the amount of available memory and maybe not that much slower with much RAM. Another idea: Repeatedly check available free memory during execution of the algorithm, and if the requirements are not met, abort cleanly, send an error message to the user, and fall back to some (maybe less secure) alternative, instead of hoping for the OOM killer to do the right thing (after a phase in which the whole system nearly freezes). |
What I think is, why don't we just give these chore to the homeserver of that room resides? Why every server that wanna join these room need to process every state/event and all of that logic? I think that can be bypassed. |
Bootstrapping room state quickly from a data/db sync, I like the idea. |
@mxvin a room is replicated to all homeservers that participate in that room, they don't live on a single server like in XMPP. |
Similar issues for me. Though not sure if it's because of RAM consumption. I used htop to track the processes and RAM almost never goes above 500MB. Currently running a homeserver on a Raspberry Pi 4 B with 4GB RAM. Initially, I was running on a Raspberry Pi 3 with 1 GB RAM. I've been able to join rooms like Element Android (2.5k), Synapse Admins (719). I'm using SQLite DB at the moment. Trying to join a room like Matrix HQ (7.8k) though takes an extremely long time to try to join the room. Eventually, my server crashes and I get a |
@lqdev First switch from sqlite to postgres. You shouldn't federate with sqlite. |
Thanks @ptman I'll give that a try. |
@ptman federation is a bit snappier after migrating to Postgres. Thanks for that suggestion. Though I'm still intermittently running into issues. I'm guessing part of that is the fact I have everything running on a RPi. To clarify, it appears it's large rooms that are bridged that I have trouble with, so I can see how that might be an issue (i.e. #techlore:matrix.org) |
It seems that with Synapse 0.26 memory consumption is much lower. Now, my server can join rooms with complexity between 20 and 30, but the largest rooms on matrix.org are still prohibitive. |
Memory usage is certainly a problem. Server's memory usage should not depend on number of historical events in a room. Ideally, the memory consumption should be constant. If there is a session state or event queue for each client, then it should be linear with number of clients. Other than that, it should be posible to run the server in constant memory space. We have a powerful SQL database available, the Synapse should use it. Anyway, if a large room is defined by number of events, can we make a state snapshot from time to time, and then synchronize from the last snapshot? This way, we can throw away (or lazy load) the history before the snapshot and every room becomes a small room. The snapshot may be a hash of the state or something like that, not necessairly prepresenting the complete state. If a client desires the earlier history, it could be provided on demand (nobody reads it all anyway). |
@jkufner complexity (resource use) does not depend on number of events (messages, attachments, etc.) but number of state events (related to e.g. federation, permission calculation, ...) https://github.com/matrix-org/synapse/blob/master/synapse/storage/databases/main/events_worker.py#L1072 |
@ptman Ok, sorry for inaccuracy; however, the argument still stands. |
Any update on this? |
This should hopefully be significantly improved in the upcoming v1.36.0 release. I'm going to close this for now, if people still see issues after updating then feel free to make a new issue. |
I installed debian11 matrix-synapse-py3 on a CX21 server machine with 2 cpu, 4GB of ram, 40GB of hard disk. It’s not really a small homeserver or a raspberry pi4. |
Description
Joining some large rooms, such as
#freenode_#haskel:matrix.org
(1.4k members), fails because synapse eats up all the available memory leading to it being forcefully stopped. I've configured my system to limit synapse to 3.5 GB of RAM. Upon joining, synapse first spends some time doing some processing (seeing high CPU usage, RAM usage close to baseline 500 MB) and after a while the RAM consumption starts to steadily climb until it reaches the 3.5 GB mark when it has to be killed.Here are the logs from the moment of joining the room up until synapse getting killed
ram-crash.redacted.log
The request for joining the room comes in at line 28. On line 564 synapse stopped printing anything to the logs and just maintained high CPU usage while steadily growing RAM consumption for ~1 minute until being killed.
Joining other large rooms, e.g.,
#matrix:matrix.org
(3.2k members),#synapse:matrix.org
(1.2k members), works fine (haven't monitored RAM consumption when joining but i had the same limits set). Some other person on#synapse:matrix.org
reported joining a room with ~20k people with RAM consumption going up to ~1.1 GB, which leads me to suspect that i might be seeing something abnormal in my case. Am i?Other than this issue, synapse seems to be working fine. I'm willing to repeat this and do some profiling if necessary.
Steps to reproduce
#freenode_#haskel:matrix.org
Version information
Homeserver: my private homeserver
Version: 1.12.1
Install method: NixOS
The text was updated successfully, but these errors were encountered: