Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

CPU increase on main process after 1.57.1 -> 1.58.1 upgrade (fetch_events background task) #12788

Closed
Fizzadar opened this issue May 18, 2022 · 8 comments
Labels
T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@Fizzadar
Copy link
Contributor

Two days ago we rolled out v1.58.1 to production and ever since there's been a substantial CPU increase on the main process doing fetch_events background task which I spotted today:

Screenshot 2022-05-18 at 14 58 24

There's also an increased memory usage overall which is what led us to discover this. Unsure what this is doing exactly, any pointers?

Wonder if this is related to #12778 at all?

@bradtgmurray
Copy link
Contributor

image

Something in Synapse now wants a lot more events, but who?

@bradtgmurray
Copy link
Contributor

bradtgmurray commented May 18, 2022

3201863 and 17d99f7 are two new callers of get_events_as_list in 1.58, but neither one makes sense I think (one we can't see the sync_partial_state_room background job in grafana, the other I think should be running on our worker that's handling /messages requests I think?)

@babolivier
Copy link
Contributor

image

What caches do the purple and red graphs correspond to?

@bradtgmurray
Copy link
Contributor

What caches do the purple and red graphs correspond to?

Apologies, they're all just *getEvent* across multiple restarts of the synapse-0 (main worker) process

@babolivier
Copy link
Contributor

I see, thanks. Could you look at the "Cache eviction rate" graph for the *getEvent* (size) label and see if it reports many size-related evictions?

@babolivier babolivier added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter labels May 19, 2022
@bradtgmurray
Copy link
Contributor

image

@Fizzadar
Copy link
Contributor Author

OK I believe I have got to the bottom of this - leftover from the NULL character fix which meant the update user directory background process was constantly failing and restarting, never updating the stream position.

Reverting the change in https://gitlab.com/beeper/synapse/-/commit/b5874621bea941f1aee59d728a13cbb17a37c438 has now stopped the constant memory growth.

The fetch event calls are still ongoing but I believe this is just the main process running through the huge gap in stream position going the background update, and should clear in the next hour or two. Will confirm and close this out.

@Fizzadar
Copy link
Contributor Author

Confirmed this is now fixed by the above revert!

It would be good to investigate ways to better attribute nested background task CPU (if this had shown as user directory background task CPU it'd be super quick to identify), but will leave that for another issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

No branches or pull requests

3 participants