-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option for a faster allocator than dlmalloc (perhaps mimalloc) #18369
Comments
I think all you would need to do is compile it and link with |
Thanks! I'll give that a shot |
I was able to get this working but didn’t notice a big difference in my particular case unfortunately. More interesting at the time, we were a few patch releases behind and once we upgraded we got a pretty significant improvement in performance I am not sure what the direct cause was but my suspicion was maybe #18186 With that said I think there could be some interesting investigation around performance and other allocators. I am certainly ok with closing this issue for now @sbc100 Is this something you care for me to hold open? |
I'm pretty sure that using an alternate allocator works as expected in emscripten. I think we have have test for it: See test_core.py So I think this issue can be closed. @Kingrd97, perhaps you have a different specific issue? Can you share your full link command? |
This is probably worth looking into for the multithreading case, since dlmalloc doesn't have per-thread arenas, and as a result we can end up with a lot of lock contention in the case of many allocations on different threads. Allocators with per-thread arenas like Adding an option for |
@arsnyder16 I am investigating some performance issue and I see you've got mimalloc integrated. It'll help alot if you could share some of the snippets to show how to build mimalloc and replace dlmalloc? |
@junyuecao Are you running into a particular issue? I don't recall hitting any roadblocks simply building mimalloc with emscripten and then linking my application with it |
@arsnyder16 I linked with mimalloc successfully but it keeps crashing in mimalloc (memory out of bound error). BTW it's a multi-threaded web app. |
hmm , Can you supply your link arguments? |
@arsnyder16 just like this
|
Can you supply the full link argument passed For example something like: Also what version of the sdk are you using? |
My experience with using mimalloc (2.1.2, emsdk 3.1.25): So having mimalloc working for threaded applications seems to be a big win. |
@Markus87 Thanks for the information! I've recently been looking into another allocator option here. On a simple benchmark I see dlmalloc not scaling at all - each additional core gets slower - while other allocators improve. So, yes, dlmalloc being single-threaded can be a problem. It will take some work to get a proper port of a new allocator, though. One issue, maybe related to the OOM issue you saw with mimalloc, is that its easiest for such a parallel malloc to not return memory to the system at all (that's what the wasi port in mimalloc does), but that's obviously not ideal. The problem is that using I hope to have a PR up in the next few weeks. |
@kripken That is good to know. I am looking forward to trying the new solution. |
With this PR if emmalloc.c is built with -DEMMALLOC_NO_STD_EXPORTS then we do not define malloc, free, etc. That means we only provide emmalloc_malloc, emmalloc_free, etc., the prefixed versions. They can then be used alongside another malloc impl. This will be useful in a later PR that adds a two-tiered allocator: a fast multithreaded one, and underneath it emmalloc, which will function as the "system allocator" for it. That is, emmalloc will play the role of VirtualAlloc on windows or mmap on POSIX, a way for the main allocator to get system memory. (We can't just use sbrk for that purpose because we also want to free memory to the system.) For that goal, emmalloc seems suitable as it is compact (we don't need it to be super-fast; this is the system allocator that will be called rarely, compared to the fast one before it). And for emmalloc to be used like that we need this PR so that we can build emmalloc alongside another allocator (that other allocator will define malloc etc. itself). Helps #18369
A PR for mimalloc is now up: #20651 - testing and feedback would be welcome! |
@kripken Thank you, this is amazing! |
Great, thanks for testing @Markus87 ! |
…20651) The new allocator can be used with -sMALLOC=mimalloc. On the benchmark added in this PR, dlmalloc does quite poorly here (getting actually slower with each additional core, because the lock contention is much larger than the actual work in the artificial benchmark). mimalloc, in comparison, scales the same as natively: more cores keeps helping. So mimalloc can be a significant speedup in codebases that have lock contention on malloc. mimalloc is significantly larger than dlmalloc, however, so we do not want it on by default. It also uses more memory, because of how mimalloc works and also due to #20645. Design-wise, this layers mimalloc on top of emmalloc. emmalloc functions as the "system allocator", which is more powerful than just using raw sbrk - sbrk can't free holes in the middle, for example. Code-wise, all of system/lib/mimalloc is unchanged from upstream (see README.emscripten) except for an ifdef or two, and then the new backend which is in system/lib/mimalloc/src/prim/emscripten/prim.c. That file has more comments explaining the design of the port. A new test is added which is also usable as a benchmark, test/other/test_mimalloc.cpp, which is where the numbers above come from. Fixes #18369
I am investigating some performance with my project and one thing that sticks out comparing platforms (win,linux,wasm) is that the wasm version seems to be slower generally where there is a fair amount of allocations.
From what i can tell emscripten uses dlmalloc which i believe is the same allocator as musl.
There is also a more compact allocator available emmalloc.
From what i can find poor allocator performance might be a know problem for musl, so i am curious about alternatives that i can try. One tricky part is the allocator must support sbrk. One promising one that i found is mimalloc. Which does seem to have some support for wasm.
Has mimalloc been explored at all? or how could i go about overriding the default malloc behavior to use use mimalloc
The text was updated successfully, but these errors were encountered: