Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Replication gets stuck with "stream X has fallen behind" #4963

Closed
richvdh opened this issue Mar 28, 2019 · 2 comments
Closed

Replication gets stuck with "stream X has fallen behind" #4963

richvdh opened this issue Mar 28, 2019 · 2 comments
Labels
A-Workers Problems related to running Synapse in Worker Mode (or replication)

Comments

@richvdh
Copy link
Member

richvdh commented Mar 28, 2019

If a worker gets sufficiently behind that there are more than 10000 updates for it to catch up on, the master drops the replication connection, and there is no way for it to catch up again.

The master reports Exception: stream <stream> has fallen behind.

Related: #4388.

@neilisfragile neilisfragile added p1 A-Workers Problems related to running Synapse in Worker Mode (or replication) labels Mar 29, 2019
@richvdh
Copy link
Member Author

richvdh commented Jul 28, 2019

The workers do actually make progress through the backlog - it's just slower than the arrival of new stuff (they get about 400 events on each connection, and make a new connection about every 30s).

One solution might to make the replication protocol distinguish between "catching up" (where the replication client requests batches of updates starting at a certain stream ID, and it will not get disconnected if it gets "behind") and "streaming" (where the server attempts to send updates in realtime).

@richvdh
Copy link
Member Author

richvdh commented Feb 16, 2022

I think this relates only to old-style TCP replication, which per #11728 we should get rid of anyway, so closing as a WONTFIX

@richvdh richvdh closed this as completed Feb 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Workers Problems related to running Synapse in Worker Mode (or replication)
Projects
None yet
Development

No branches or pull requests

2 participants