Skip to content

Commit

Permalink
Add a port of mimalloc, a fast and scalable multithreaded allocator (#…
Browse files Browse the repository at this point in the history
…20651)

The new allocator can be used with -sMALLOC=mimalloc.

On the benchmark added in this PR, dlmalloc does quite poorly here (getting
actually slower with each additional core, because the lock contention is much
larger than the actual work in the artificial benchmark). mimalloc, in
comparison, scales the same as natively: more cores keeps helping. So mimalloc
can be a significant speedup in codebases that have lock contention on malloc.

mimalloc is significantly larger than dlmalloc, however, so we do not want it
on by default. It also uses more memory, because of how mimalloc works and also
due to #20645.

Design-wise, this layers mimalloc on top of emmalloc. emmalloc functions as the
"system allocator", which is more powerful than just using raw sbrk - sbrk can't
free holes in the middle, for example.

Code-wise, all of system/lib/mimalloc is unchanged from upstream (see
README.emscripten) except for an ifdef or two, and then the new backend which
is in system/lib/mimalloc/src/prim/emscripten/prim.c. That file has more
comments explaining the design of the port.

A new test is added which is also usable as a benchmark,
test/other/test_mimalloc.cpp, which is where the numbers above come from.

Fixes #18369
  • Loading branch information
kripken authored Nov 16, 2023
1 parent 90ab3a7 commit 165c1a3
Show file tree
Hide file tree
Showing 52 changed files with 17,498 additions and 5 deletions.
2 changes: 2 additions & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ See docs/process.md for more on how version tagging works.

3.1.50 (in development)
-----------------------
- Add a port of mimalloc, a fast and scalable multithreaded allocator. To use
it, build with `-sMALLOC=mimalloc`. (#20651)
- When compiling, Emscripten will now invoke `clang` or `clang++` depending only
on whether `emcc` or `em++` was run. Previously it would determine which to
run based on individual file extensions. One side effect of this is that you
Expand Down
2 changes: 2 additions & 0 deletions embuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@
'libemmalloc-memvalidate',
'libemmalloc-verbose',
'libemmalloc-memvalidate-verbose',
'libmimalloc',
'libmimalloc-mt',
'libGL',
'libhtml5',
'libsockets',
Expand Down
9 changes: 9 additions & 0 deletions site/source/docs/optimizing/Optimizing-Code.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,15 @@ Enable :ref:`debugging-EMCC_DEBUG` to output files for each compilation phase, i

.. _optimizing-code-unsafe-optimisations:

Allocation
----------

The default ``malloc/free`` implementation used is ``dlmalloc``. You can also
pick ``emmalloc`` (``-sMALLOC=emmalloc``) which is smaller but less fast, or
``mimalloc`` (``-sMALLOC=mimalloc``) which is larger but scales better in a
multithreaded application with contention on ``malloc/free`` (see
:ref:`Allocator_performance`).

Unsafe optimizations
====================

Expand Down
18 changes: 18 additions & 0 deletions site/source/docs/porting/pthreads.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,24 @@ The Emscripten implementation for the pthreads API should follow the POSIX stand

Also note that when compiling code that uses pthreads, an additional JavaScript file ``NAME.worker.js`` is generated alongside the output .js file (where ``NAME`` is the basename of the main file being emitted). That file must be deployed with the rest of the generated code files. By default, ``NAME.worker.js`` will be loaded relative to the main HTML page URL. If it is desirable to load the file from a different location e.g. in a CDN environment, then one can define the ``Module.locateFile(filename)`` function in the main HTML ``Module`` object to return the URL of the target location of the ``NAME.worker.js`` entry point. If this function is not defined in ``Module``, then the default location relative to the main HTML file is used.

.. _Allocator_performance:

Allocator performance
=====================

The default system allocator in Emscripten, ``dlmalloc``, is very efficient in a
single-threaded program, but it has a single global lock which means if there is
contention on ``malloc`` then you can see overhead. You can use
`mimalloc <https://github.com/microsoft/mimalloc>`_
instead by using ``-sMALLOC=mimalloc``, which is a more sophisticated allocator
tuned for multithreaded performance. ``mimalloc`` has separate allocation
contexts on each thread, allowing performance to scale a lot better under
``malloc/free`` contention.

Note that ``mimalloc`` is larger in code size than ``dlmalloc``, and also uses
more memory at runtime (so you may need to adjust ``INITIAL_MEMORY`` to a higher
value), so there are tradeoffs here.

Running code and tests
======================

Expand Down
3 changes: 3 additions & 0 deletions src/settings.js
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ var STACK_SIZE = 64*1024;
// * emmalloc-verbose - use emmalloc with assertions + verbose logging.
// * emmalloc-memvalidate-verbose - use emmalloc with assertions + heap
// consistency checking + verbose logging.
// * mimalloc - a powerful mulithreaded allocator. This is recommended in
// large applications that have malloc() contention, but it is
// larger and uses more memory.
// * none - no malloc() implementation is provided, but you must implement
// malloc() and free() yourself.
// dlmalloc is necessary for split memory and other special modes, and will be
Expand Down
21 changes: 21 additions & 0 deletions system/lib/mimalloc/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2018-2021 Microsoft Corporation, Daan Leijen

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
7 changes: 7 additions & 0 deletions system/lib/mimalloc/README.emscripten
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

This contains mimalloc 4e50d6714d471b72b2285e25a3df6c92db944593 with
Emscripten backend additions.

Origin: https://github.com/microsoft/mimalloc

For the Emscripten port design see src/prim/emscripten/prim.c
66 changes: 66 additions & 0 deletions system/lib/mimalloc/include/mimalloc-new-delete.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/* ----------------------------------------------------------------------------
Copyright (c) 2018-2020 Microsoft Research, Daan Leijen
This is free software; you can redistribute it and/or modify it under the
terms of the MIT license. A copy of the license can be found in the file
"LICENSE" at the root of this distribution.
-----------------------------------------------------------------------------*/
#pragma once
#ifndef MIMALLOC_NEW_DELETE_H
#define MIMALLOC_NEW_DELETE_H

// ----------------------------------------------------------------------------
// This header provides convenient overrides for the new and
// delete operations in C++.
//
// This header should be included in only one source file!
//
// On Windows, or when linking dynamically with mimalloc, these
// can be more performant than the standard new-delete operations.
// See <https://en.cppreference.com/w/cpp/memory/new/operator_new>
// ---------------------------------------------------------------------------
#if defined(__cplusplus)
#include <new>
#include <mimalloc.h>

#if defined(_MSC_VER) && defined(_Ret_notnull_) && defined(_Post_writable_byte_size_)
// stay consistent with VCRT definitions
#define mi_decl_new(n) mi_decl_nodiscard mi_decl_restrict _Ret_notnull_ _Post_writable_byte_size_(n)
#define mi_decl_new_nothrow(n) mi_decl_nodiscard mi_decl_restrict _Ret_maybenull_ _Success_(return != NULL) _Post_writable_byte_size_(n)
#else
#define mi_decl_new(n) mi_decl_nodiscard mi_decl_restrict
#define mi_decl_new_nothrow(n) mi_decl_nodiscard mi_decl_restrict
#endif

void operator delete(void* p) noexcept { mi_free(p); };
void operator delete[](void* p) noexcept { mi_free(p); };

void operator delete (void* p, const std::nothrow_t&) noexcept { mi_free(p); }
void operator delete[](void* p, const std::nothrow_t&) noexcept { mi_free(p); }

mi_decl_new(n) void* operator new(std::size_t n) noexcept(false) { return mi_new(n); }
mi_decl_new(n) void* operator new[](std::size_t n) noexcept(false) { return mi_new(n); }

mi_decl_new_nothrow(n) void* operator new (std::size_t n, const std::nothrow_t& tag) noexcept { (void)(tag); return mi_new_nothrow(n); }
mi_decl_new_nothrow(n) void* operator new[](std::size_t n, const std::nothrow_t& tag) noexcept { (void)(tag); return mi_new_nothrow(n); }

#if (__cplusplus >= 201402L || _MSC_VER >= 1916)
void operator delete (void* p, std::size_t n) noexcept { mi_free_size(p,n); };
void operator delete[](void* p, std::size_t n) noexcept { mi_free_size(p,n); };
#endif

#if (__cplusplus > 201402L || defined(__cpp_aligned_new))
void operator delete (void* p, std::align_val_t al) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
void operator delete[](void* p, std::align_val_t al) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
void operator delete (void* p, std::size_t n, std::align_val_t al) noexcept { mi_free_size_aligned(p, n, static_cast<size_t>(al)); };
void operator delete[](void* p, std::size_t n, std::align_val_t al) noexcept { mi_free_size_aligned(p, n, static_cast<size_t>(al)); };
void operator delete (void* p, std::align_val_t al, const std::nothrow_t&) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
void operator delete[](void* p, std::align_val_t al, const std::nothrow_t&) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }

void* operator new (std::size_t n, std::align_val_t al) noexcept(false) { return mi_new_aligned(n, static_cast<size_t>(al)); }
void* operator new[](std::size_t n, std::align_val_t al) noexcept(false) { return mi_new_aligned(n, static_cast<size_t>(al)); }
void* operator new (std::size_t n, std::align_val_t al, const std::nothrow_t&) noexcept { return mi_new_aligned_nothrow(n, static_cast<size_t>(al)); }
void* operator new[](std::size_t n, std::align_val_t al, const std::nothrow_t&) noexcept { return mi_new_aligned_nothrow(n, static_cast<size_t>(al)); }
#endif
#endif

#endif // MIMALLOC_NEW_DELETE_H
67 changes: 67 additions & 0 deletions system/lib/mimalloc/include/mimalloc-override.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
/* ----------------------------------------------------------------------------
Copyright (c) 2018-2020 Microsoft Research, Daan Leijen
This is free software; you can redistribute it and/or modify it under the
terms of the MIT license. A copy of the license can be found in the file
"LICENSE" at the root of this distribution.
-----------------------------------------------------------------------------*/
#pragma once
#ifndef MIMALLOC_OVERRIDE_H
#define MIMALLOC_OVERRIDE_H

/* ----------------------------------------------------------------------------
This header can be used to statically redirect malloc/free and new/delete
to the mimalloc variants. This can be useful if one can include this file on
each source file in a project (but be careful when using external code to
not accidentally mix pointers from different allocators).
-----------------------------------------------------------------------------*/

#include <mimalloc.h>

// Standard C allocation
#define malloc(n) mi_malloc(n)
#define calloc(n,c) mi_calloc(n,c)
#define realloc(p,n) mi_realloc(p,n)
#define free(p) mi_free(p)

#define strdup(s) mi_strdup(s)
#define strndup(s,n) mi_strndup(s,n)
#define realpath(f,n) mi_realpath(f,n)

// Microsoft extensions
#define _expand(p,n) mi_expand(p,n)
#define _msize(p) mi_usable_size(p)
#define _recalloc(p,n,c) mi_recalloc(p,n,c)

#define _strdup(s) mi_strdup(s)
#define _strndup(s,n) mi_strndup(s,n)
#define _wcsdup(s) (wchar_t*)mi_wcsdup((const unsigned short*)(s))
#define _mbsdup(s) mi_mbsdup(s)
#define _dupenv_s(b,n,v) mi_dupenv_s(b,n,v)
#define _wdupenv_s(b,n,v) mi_wdupenv_s((unsigned short*)(b),n,(const unsigned short*)(v))

// Various Posix and Unix variants
#define reallocf(p,n) mi_reallocf(p,n)
#define malloc_size(p) mi_usable_size(p)
#define malloc_usable_size(p) mi_usable_size(p)
#define cfree(p) mi_free(p)

#define valloc(n) mi_valloc(n)
#define pvalloc(n) mi_pvalloc(n)
#define reallocarray(p,s,n) mi_reallocarray(p,s,n)
#define reallocarr(p,s,n) mi_reallocarr(p,s,n)
#define memalign(a,n) mi_memalign(a,n)
#define aligned_alloc(a,n) mi_aligned_alloc(a,n)
#define posix_memalign(p,a,n) mi_posix_memalign(p,a,n)
#define _posix_memalign(p,a,n) mi_posix_memalign(p,a,n)

// Microsoft aligned variants
#define _aligned_malloc(n,a) mi_malloc_aligned(n,a)
#define _aligned_realloc(p,n,a) mi_realloc_aligned(p,n,a)
#define _aligned_recalloc(p,s,n,a) mi_aligned_recalloc(p,s,n,a)
#define _aligned_msize(p,a,o) mi_usable_size(p)
#define _aligned_free(p) mi_free(p)
#define _aligned_offset_malloc(n,a,o) mi_malloc_aligned_at(n,a,o)
#define _aligned_offset_realloc(p,n,a,o) mi_realloc_aligned_at(p,n,a,o)
#define _aligned_offset_recalloc(p,s,n,a,o) mi_recalloc_aligned_at(p,s,n,a,o)

#endif // MIMALLOC_OVERRIDE_H
Loading

0 comments on commit 165c1a3

Please sign in to comment.