Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed memory leak #2

Merged
merged 2 commits into from
Dec 15, 2024
Merged

Fixed memory leak #2

merged 2 commits into from
Dec 15, 2024

Conversation

Hydrocharged
Copy link
Collaborator

@Hydrocharged Hydrocharged commented Dec 14, 2024

This resolves a runaway memory leak. In a nutshell, wazero.Runtime is holding on to memory that is being allocated through api.Module. Even when the modules are being closed properly, the runtime still accumulates the memory. In truth, I think there may be a potential bug in the glue code between Go and WASM, however I wasn't able to track it down, and the result is that wazero.Runtime is accumulating memory.

Closing and removing the reference to wazero.Runtime does free the memory, however instantiation of wazero.Runtime is relatively slow, and it would not be feasible to create a runtime every time that we need to use a regex.

Instead, I've created a custom pool structure that pools modules with a runtime, and puts a cap on the number of times a module may be fetched from a runtime. Once that cap has been reached, the pool waits for all outstanding modules to be returned for that runtime (whether through Put or GC collection), before closing the runtime and removing it from the pool. This puts a soft cap on the amount of memory that a runtime can generate. With this in place, we'll still see spikes since the GC determining when to collect the runtimes isn't deterministic, but it at least prevents runaway memory usage.

Using a multithreaded application to spam the server with queries, I was able to run out of memory in < 3 minutes on the main branch. With this pool in place, the memory usage stabilized, and I ran it for 3 hours before deciding that it was good enough. So while this doesn't fix the underlying problem, it at least fixes the memory usage error. Performance-wise, there's about a 20-25% reduction in query speed, for regexes, using the multithreaded application.

@Hydrocharged Hydrocharged force-pushed the daylon/fix-memory-leak branch from 71bf7e7 to 3407cc0 Compare December 14, 2024 16:36
@max-hoffman
Copy link

see #3

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as an urgent customer fix, but it strikes me as pretty complex for the problem it's solving.

You're basically writing your own 2-level pool implementation because sync.Pool doesn't give you enough control over max size and lifecycle to implement this behavior. This would be more naturally expressed as a pool of pools, rather than slices and maps. And it might be faster as well.

I only briefly researched these alternatives, but you should check them out and see if they can be used:

https://github.com/jolestar/go-commons-pool

Another option would be use a linked list instead of slices, which would help you avoid copying the slices. Not sure if that's a major part of the performance diff or not, but worth considering:

https://pkg.go.dev/container/list

pool.go Outdated Show resolved Hide resolved
runtime.SetFinalizer(module, nil)
}
// Grab the runtime ID and remove the module from the tracking map
ptr := reflect.ValueOf(module).Pointer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is slow, it might make sense to get it a single time at creation, and for the pool to vend an object which includes both the api.Module and its uintptr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the slow portion. It's wazero.NewRuntime(ctx) that's slow (like half a second on my machine). That's why we can't just create a new one on every request. Second to that is InstantiateModule. Everything else is essentially instant in comparison. The slowdown is due to having to periodically a new runtime, not really anything else.

@Hydrocharged
Copy link
Collaborator Author

Hydrocharged commented Dec 15, 2024

I may not have been clear or detailed enough in the original PR message, so I'll restate the problem that I observed.

  • wazero.Runtime is the source of the leakage, not the modules.

Any solution that does not involve the removal of wazero.Runtime will not fix the leak. The leak is not related to any aspect of the pool, in that whether api.Module is returned or not (or closed or not), wazero.Runtime will continue to leak. Something that we are doing with the modules is causing increasing memory in wazero.Runtime. @zachmu albeit quite complex (I very much agree, this was my 5th option attempting to fix), this is a solution that will dispose of wazero.Runtime.

The issue regarding performance degradation is strictly related to calls to wazero.NewRuntime (and extra calls on InstantiateModule using that new runtime). We've gone from calling it once (at program start) to calling it after n module fetches. The lower the fetch limit, the more frequent the runtime creations, and the lower the overall performance (with lower stabilized memory usage too). Setting the fetch limit such that we only use a single wazero.Runtime results in performance that is within run-to-run variance of the original implementation (at least on my machine).

I'll also mention that, by definition of a leak, I'm mainly referring to an out-of-memory error with runaway memory usage. This was the only solution that I tried that did not result in an eventual out-of-memory error (with it running for over 3 hours, while the other options never lasted more than 10 minutes). We can absolutely use another solution that is simpler, but it also needs to handle the memory leak (and/or runaway memory usage resulting in an out-of-memory error).

@Hydrocharged Hydrocharged merged commit db690dd into main Dec 15, 2024
3 checks passed
@Hydrocharged Hydrocharged deleted the daylon/fix-memory-leak branch December 15, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants