replication wedged after spike in event traffic #4549

richvdh · 2019-02-02T08:03:00Z

we received a big spike of 10000 membership events; afterwards replication seemed to get itself into a loop of repeated 'could not keep up' disconnections.

turt2live · 2019-02-02T19:06:22Z

Related: #3495 ?
Related: #2738

richvdh · 2019-02-02T21:47:02Z

I don't think it was due to send_join requests, so #3495 is likely different. It could be related to #2738, though in our case the synapse master was pinned to 100% CPU so again it looked a bit different.

richvdh · 2019-02-02T21:51:02Z

Master CPU graph. Influx of events was at 07:13. Note that it handles that ok and doesn't actually blow up until 07:20. The dip at 07:34 was a whole-system restart.

richvdh · 2019-02-02T22:05:05Z

It's all going to replication-REPLICATE-caches

I guess the problem is this big spike in cache invalidation commands

It doesn't actually seem to be making much progess, though

richvdh · 2021-01-14T15:42:49Z

I'm going to assume that this is no longer useful, given replication has been rewritten.

richvdh closed this as completed Jan 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replication wedged after spike in event traffic #4549

replication wedged after spike in event traffic #4549

richvdh commented Feb 2, 2019

turt2live commented Feb 2, 2019

richvdh commented Feb 2, 2019

richvdh commented Feb 2, 2019

richvdh commented Feb 2, 2019

richvdh commented Jan 14, 2021

replication wedged after spike in event traffic #4549

replication wedged after spike in event traffic #4549

Comments

richvdh commented Feb 2, 2019

turt2live commented Feb 2, 2019

richvdh commented Feb 2, 2019

richvdh commented Feb 2, 2019

richvdh commented Feb 2, 2019

richvdh commented Jan 14, 2021