Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a port of mimalloc, a fast and scalable multithreaded allocator #20651

Merged
merged 56 commits into from
Nov 16, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented Nov 8, 2023

The new allocator can be used with -sMALLOC=mimalloc.

Some numbers comparing native, wasm+dlmalloc, and wasm+mimalloc:

chart (1)

Lower is better there. dlmalloc does quite poorly here, getting actually slower with
each additional core, because the lock contention is much large than the actual work the
(very artificial) benchmark does. That's in log scale because of dlmalloc, here is a normal
scale of native vs wasm+mimalloc:

chart

With mimalloc, wasm scales the same as natively: more cores keeps helping, up to 4
cores on this machine (which has 4 cores).

mimalloc is significantly larger than dlmalloc, however, so we definitely do not want it
on by default. It also uses more memory, both because of how mimalloc works and also
due to #20645 Likely many multithreaded applications won't need it. But it is is a
good option to have, for applications that have significant amounts of contention on
malloc/free across threads.

Design-wise, this layers mimalloc on top of emmalloc. emmalloc functions as the
"system allocator", which is more powerful than just using raw sbrk - sbrk can't free
holes in the middle, for example.

Code-wise, all of system/lib/mimalloc is unchanged from upstream (see README.emscripten)
except for an ifdef or two, and then the new backend which is in
system/lib/mimalloc/src/prim/emscripten/prim.c. That file has more comments explaining
the design of the port.

A new test is added which is also usable as a benchmark, test/other/test_mimalloc.cpp,
which is where the numbers above come from.

Fixes #18369

'alloc-override.c',
'page-queue.c',
'static.c',
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it can all be on a single line


# Build all of mimalloc, and also emmalloc which is used as the system
# allocator underneath it.
src_dir = 'system/lib/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this line I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, not. Other libraries do this too. Without this it errors, as it uses this internally it seems. That might be worth working on, though, as it is surprising to me too.

filenames=['prim.c']
) + files_in_path(
path='system/lib/',
filenames=['emmalloc.c'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in this file we tend to put each + in its on statement. I think that will make it less messy:

e.g.:

    libc_files += files_in_path(                                                                     
        path='system/lib/libc/musl/src/legacy',                                                      
        filenames=['getpagesize.c', 'err.c'])                                                        
                                                                                                     
    libc_files += files_in_path(                                                                     
        path='system/lib/libc/musl/src/linux',                                                       
        filenames=['getdents.c', 'gettid.c', 'utimes.c'])                                            
                                                                                                     
    libc_files += files_in_path(                                                                     
        path='system/lib/libc/musl/src/sched',                                                       
        filenames=['sched_yield.c'])                                                                 
                                                                                                     
    libc_files += files_in_path(                                                                     
        path='system/lib/libc/musl/src/exit',                                                        
        filenames=['_Exit.c', 'atexit.c', 'at_quick_exit.c', 'quick_exit.c'])                        
                                                                                                     
    libc_files += files_in_path(                                                                     
        path='system/lib/libc/musl/src/ldso',                                                        
        filenames=['dladdr.c', 'dlerror.c', 'dlsym.c', 'dlclose.c'])       
    ```

@dschuff
Copy link
Member

dschuff commented Nov 8, 2023

I think I may have asked this before, but is there a reason we can't just make the allocators regular static libs and select by using -lemmalloc or -lmimalloc?

For one thine the actually name of the underlying malloc isn't always that short name. For example it could be more like ``libemmalloc-mt-debug.a`. I think maybe we have some logic map the short name to the long name but its a bit too magical for my liking.

OK yeah, I'm not a big fan of magical -s flags that do nothing but link libraries, but had forgotten about all the named variants, so I guess we can't really avoid some magic for those libraries. And if there's going to be magic, I think it makes sense to make it obvious that this is emscripten-specific magic (as opposed to adding magic to a mechanism that is otherwise simple and transparent and would probably be surprising to users if there were magic).

We have run into issues before where folks specify -lc and then the system adds a default one too.

When users are specifying -lc is it because they are actually providing their own libc and want to avoid linking ours? or is it just carried over from some other system and doesn't really reflect a desire for non-default behavior? If the latter, then maybe we should just ignore -lc since we already have magical libc behavior. If the former... could that ever actually work?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 8, 2023

I think I may have asked this before, but is there a reason we can't just make the allocators regular static libs and select by using -lemmalloc or -lmimalloc?

For one thine the actually name of the underlying malloc isn't always that short name. For example it could be more like ``libemmalloc-mt-debug.a`. I think maybe we have some logic map the short name to the long name but its a bit too magical for my liking.

OK yeah, I'm not a big fan of magical -s flags that do nothing but link libraries, but had forgotten about all the named variants, so I guess we can't really avoid some magic for those libraries. And if there's going to be magic, I think it makes sense to make it obvious that this is emscripten-specific magic (as opposed to adding magic to a mechanism that is otherwise simple and transparent and would probably be surprising to users if there were magic).

We have run into issues before where folks specify -lc and then the system adds a default one too.

When users are specifying -lc is it because they are actually providing their own libc and want to avoid linking ours? or is it just carried over from some other system and doesn't really reflect a desire for non-default behavior? If the latter, then maybe we should just ignore -lc since we already have magical libc behavior. If the former... could that ever actually work?

It was that latter. And yes, I think we made it work my somehow ignore the -lc and/or replacing it with our magic libc lookup.

@kripken
Copy link
Member Author

kripken commented Nov 9, 2023

I forgot to say, I ran mimalloc on core2 and while there are failures, none seemed like blockers. For example sanitizers will require followup work (people can sanitize with dlmalloc for now), and things like mallinfo are not supported, and some tests fail because mimalloc needs more memory. I saw no logic errors, in other words.

@kripken
Copy link
Member Author

kripken commented Nov 15, 2023

Friendly ping @sbc100 @dschuff - any other comments or does this look good? (I may wait on landing it for #20704 which makes it more efficient, but I'd like to know it's ready to land).

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

@kripken kripken merged commit 165c1a3 into main Nov 16, 2023
2 checks passed
@kripken kripken deleted the mimalloc branch November 16, 2023 17:52
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
ensisoft added a commit to ensisoft/detonator that referenced this pull request Oct 11, 2024
Upgrade to Emscripten💩 3.1.50 in order to enable mimalloc.

The default allocator (dlmalloc) has dog-shit for performance
under multiple threads due to plenty of locking happening inside
the allocator. The better allocator is mimalloc that requires
Emscripten💩 3.1.50

Emscripten💩 3.1.27 changed the default stack size to 64kb.
Restore the original 5MB stack size with

  -sSTACK_SIZE=5MB

Upgrading Emscripten comes with a new clang that is based on
a new LLVM version that has a regression so sol2 build fails:
ThePhD/sol2#1614

Adding a WAR in sol.hpp to change the offendin function signatures
to unconditonal noexcept.

mimalloc performance:
emscripten-core/emscripten#20651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option for a faster allocator than dlmalloc (perhaps mimalloc)
3 participants