-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
trial
test runs leak memory
#15622
Comments
When I force a full GC every 50 or so gen0 GCs, a trial run no longer OOMs on my laptop. But memory usage still keeps growing and the GCs get slower and slower. So there's a separate memory leak going on. According to tracemalloc, the most leaked objects are
but this isn't too helpful in tracking down the leak. |
One of the leaks is due to synapse/synapse/util/ratelimitutils.py Line 89 in a302d3e
Another "leak" is due to the config cache added in #15284 growing large (~80 entries). |
I'm going to leave this one alone. I'd replace things with |
FWIW I think |
This change fixes two memory leaks during `trial` test runs. Garbage collection is disabled during each test case and a gen-0 GC is run at the end of each test. However, when the gen-0 GC is run, the `TestCase` object usually still holds references to the `HomeServer` used during the test. As a result, the `HomeServer` gets promoted to gen-1 and then never garbage collected. Fix this by periodically running full GCs. Additionally, fix `HomeServer`s leaking after tests that touch inbound federation due to `FederationRateLimiter`s adding themselves to a global set, by turning the set into a `WeakSet`. Resolves #15622. Signed-off-by: Sean Quah <[email protected]>
That's true, we could frozendictify the config dicts and use them as cache keys. I'd balk at the copying involved, but the current code does a |
There are
|
…5646) ...to try to control memory usage. `HomeServerConfig`s hold on to many Jinja2 objects, which come out to over 0.5 MiB per config. Over the course of a full test run, the cache grows to ~360 entries. Limit it to 8 entries. Part of #15622. Signed-off-by: Sean Quah <[email protected]>
The cache resizing callbacks are also painful to fix, even when using I'm going to leave these alone. |
I've left the hacky code I used on the squah/trial_memory_leak_tracking branch. Usage: run the tests, wait for the breakpoint to be hit, count objects of each type using |
#15334 introduced a bug where
HomeServer
instances no longer get garbage collected duringtrial
test runs. As a result, memory usage grows to multiple gigabytes during test runs.See https://github.com/matrix-org/synapse/pull/15334/files#diff-f50ec854a65082453f4086048f20c291ca6020fab973486a9da2285025394cc8R180.
When we garbage collect generation 0,
self.hs
still holds a reference to theHomeServer
, which gets promoted to generation 1 and then never collected.Apart from that, there are also a couple of other leaks at play.
HomeServer
s leaking due to not being GCed.Fix
HomeServer
s leaking duringtrial
test runs #15630FederationRateLimiter
(and referencedHomeServer
s) leaking due to_rate_limiter_instances
global.Fix
HomeServer
s leaking duringtrial
test runs #15630HomeServerConfig
s are pretty heavy and hold on to a lot of stuff, including Jinja templates. A suitable max size for the cache might be 8-16 entries.Limit the size of the
HomeServerConfig
cache in trial test runs #15646HomeServer
s leaking via synapse.util.caches.caches_by_name.HomeServer
s leaking via synapse.util.caches.CACHE_METRIC_REGISTRY.HomeServer
s leaking via synapse.config.cache._CACHES.The last 3 are fairly annoying to fix, so I've left them alone.
The text was updated successfully, but these errors were encountered: