-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak in subscribeRepos
rollback window
#39
Comments
Restarting seems to have fixed it, but atproto-hub CPU is pegged at 100% working through the backlog right now, so we're not out of the woods just yet. |
So weird. I don't see a pattern in the usage spike yet. Mostly posts, from a range of users and AP instances and web sites. A few examples from 11:05-11:15a:
Doesn't look like we were backed up and then suddenly caught up either. ![]() |
Out of the woods, everything looks back to normal. Hrmph. |
trying to offload more CPU from the firehose client. for #1266
switch to putting raw websocket frame bytes onto queue, then threads parse it. for #1266
Related: snarfed/bridgy-fed#1295 |
Haven't seen this since we optimized and switched from dag_cbor to libipld. Tentatively closing. |
Reopening, still happening. Only when we're behind serving events over our firehose, so it's hard to debug, but definitely happening right now. 😕 |
Bumping hub memory up to 6G as a band-aid. |
I'm pretty confident this is in the rollback window part of Lines 179 to 189 in 69846b5
Lines 309 to 325 in 69846b5
arroba/arroba/datastore_storage.py Lines 528 to 554 in 69846b5
Moving this issue to the arroba repo. |
I wonder if this is our tracking of seen CIDs in |
subscribeRepos
rollback window
Never mind, we don't actually do that. Maybe ndb query caching? |
It's not a fix for the memory leak, but one thing that would help here would be to cache all of the rollback window's blocks in memory and serve them from there. That would also be half of #30 |
Deprioritizing, this hasn't been happening much any more for a while now, but #30 is getting acute. |
atproto-hub hung itself just now. Evidently we made and emitted a ton of commits all of a sudden, >20qps sustained during 10:45-11:15a PT, so ~36k total. Sheesh.
The text was updated successfully, but these errors were encountered: