Replication gets stuck with "stream X has fallen behind" #4963

richvdh · 2019-03-28T17:00:30Z

If a worker gets sufficiently behind that there are more than 10000 updates for it to catch up on, the master drops the replication connection, and there is no way for it to catch up again.

The master reports Exception: stream <stream> has fallen behind.

Related: #4388.

The text was updated successfully, but these errors were encountered:

richvdh · 2019-07-28T20:52:19Z

The workers do actually make progress through the backlog - it's just slower than the arrival of new stuff (they get about 400 events on each connection, and make a new connection about every 30s).

One solution might to make the replication protocol distinguish between "catching up" (where the replication client requests batches of updates starting at a certain stream ID, and it will not get disconnected if it gets "behind") and "streaming" (where the server attempts to send updates in realtime).

richvdh · 2022-02-16T10:41:45Z

I think this relates only to old-style TCP replication, which per #11728 we should get rid of anyway, so closing as a WONTFIX

neilisfragile added p1 A-Workers Problems related to running Synapse in Worker Mode (or replication) labels Mar 29, 2019

This was referenced Feb 11, 2020

workers stop working after elevated traffic #2738

Closed

"stream federation has fallen behind" + a bunch of user IDs #4388

Closed

richvdh closed this as completed Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replication gets stuck with "stream X has fallen behind" #4963

Replication gets stuck with "stream X has fallen behind" #4963

richvdh commented Mar 28, 2019

richvdh commented Jul 28, 2019

richvdh commented Feb 16, 2022

Replication gets stuck with "stream X has fallen behind" #4963

Replication gets stuck with "stream X has fallen behind" #4963

Comments

richvdh commented Mar 28, 2019

richvdh commented Jul 28, 2019

richvdh commented Feb 16, 2022