Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

replication wedged after spike in event traffic #4549

Closed
richvdh opened this issue Feb 2, 2019 · 5 comments
Closed

replication wedged after spike in event traffic #4549

richvdh opened this issue Feb 2, 2019 · 5 comments

Comments

@richvdh
Copy link
Member

richvdh commented Feb 2, 2019

we received a big spike of 10000 membership events; afterwards replication seemed to get itself into a loop of repeated 'could not keep up' disconnections.

@turt2live
Copy link
Member

Related: #3495 ?
Related: #2738

@richvdh
Copy link
Member Author

richvdh commented Feb 2, 2019

I don't think it was due to send_join requests, so #3495 is likely different. It could be related to #2738, though in our case the synapse master was pinned to 100% CPU so again it looked a bit different.

@richvdh
Copy link
Member Author

richvdh commented Feb 2, 2019

image

Master CPU graph. Influx of events was at 07:13. Note that it handles that ok and doesn't actually blow up until 07:20. The dip at 07:34 was a whole-system restart.

@richvdh
Copy link
Member Author

richvdh commented Feb 2, 2019

image

It's all going to replication-REPLICATE-caches

image

I guess the problem is this big spike in cache invalidation commands

image

It doesn't actually seem to be making much progess, though

@richvdh
Copy link
Member Author

richvdh commented Jan 14, 2021

I'm going to assume that this is no longer useful, given replication has been rewritten.

@richvdh richvdh closed this as completed Jan 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants