Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dlmalloc for allocations bigger than 2GB #18055

Merged
merged 10 commits into from
Oct 21, 2022

Conversation

miwelc
Copy link
Contributor

@miwelc miwelc commented Oct 13, 2022

Background of this PR: #17747

@miwelc
Copy link
Contributor Author

miwelc commented Oct 13, 2022

I've introduced unwanted changes due to whitespace removal. I'll fix it.

@@ -1678,7 +1678,12 @@ extern size_t getpagesize();
#define TWO_SIZE_T_SIZES (SIZE_T_SIZE<<1)
#define FOUR_SIZE_T_SIZES (SIZE_T_SIZE<<2)
#define SIX_SIZE_T_SIZES (FOUR_SIZE_T_SIZES+TWO_SIZE_T_SIZES)
#if __EMSCRIPTEN__
/* Emscripten's sbrk can interpret unsigned values greater than (MAX_SIZE_T / 2U) (2GB) correctly */
#define HALF_MAX_SIZE_T (MAX_SIZE_T)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But clearly the value of this macro is now incorrect, and dlmalloc was not designed to have HALF_MAX_SIZE_T == MAX_SIZE_T on any system.

Does does dlmalloc deal with requests for memory that are larger than HALF_MAX_SIZE_T on other systems? Shouldn't it just call sbrk multiple times in this case to get enough continuous memory?

Copy link
Contributor Author

@miwelc miwelc Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, dlmalloc doesn't do multiple calls when ssize < HALF_MAX_SIZE_T. The problem is not that it doesn't get enough continuous memory, is that it doesn't even call sbrk in the first place.
You are right, HALF_MAX_SIZE_T is not MAX_SIZE_T. But that macro is only used with the purpose of checking the parameter passed to sbrk(). It is the minimal change required to fix this issue and that's why I opted for it.
I can change all conditionals instead to ssize < MAX_SIZE_T or remove them since they would become always true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a fundamental limit of dlmalloc? It cannot allocate anything larger than half of size_t? At least not in MORECORE mode.

Copy link
Contributor Author

@miwelc miwelc Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As the comment in their source code says, they limit this because sbrk parameter is signed. I don't know if that is documented anywhere but I've checked it and in MacOS, for example, sbrk is defined as void *sbrk(int); in unistd.h under some conditions at least (it seems they later changed it to intprt_t). As you've said, this only affects MORECORE mode. When using mmap this works as expected.

But Emscripten's sbrk doesn't have that interface or at least interprets unsigned values greater than HALF_MAX_SIZE_T correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to patch dlmalloc like this perhaps we want to introduce another setting which completely ignores HALF_MAX_SIZE_T? i.e. in this mode we could not defined HALF_MAX_SIZE_T at all and #ifdef out all the places it is checked?

How about UNLIMITED_MORECORE?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I think that would work. I assume dlmalloc is being careful here and not assuming it can call sbrk twice and have something make changes in between, but for us that is a safe assumption: only sbrk will modify the memory size, and we lock around dlmalloc so even pthreads builds should be ok. The only risk I can think of is someone calling emscripten_resize_heap in between on another thread, but I think even that might work (since that function takes the final requested size, not a delta).

Perhaps the safe thing is to call in a loop but assert on later iterations seeing that nothing changed in between.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe dlmalloc is already built with that assumption:

 MORECORE_CONTIGUOUS       default: 1 (true) if HAVE_MORECORE                    
 If true, take advantage of fact that consecutive calls to MORECORE              
 with positive arguments always return contiguous increasing                     
 addresses.  This is true of unix sbrk. It does not hurt too much to             
 set it true anyway, since malloc copes with non-contiguities.                   
 Setting it false when definitely non-contiguous saves time                      
 and possibly wasted space it would take to discover this though.   

Copy link
Collaborator

@sbc100 sbc100 Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think emscripten_resize_heap changes the sbrk pointer... only sbrk can do that. emscripten_resize_heap just resizes the memory. It should really be called emscripten_resize_memory .. just like all HEAPU8 and friends should really have been called MEMU8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, but MORECORE_CONTIGUOUS isn't enough, I think, since someone else can call it in between your calls? You would allocate [A, B), they would allocate [B, C), and you'd then allocate [C, D), but your two allocations are not contiguous even though from sbrk's point of view they are. But probably I'm missing something...

Good point about emscripten_resize_heap, I think that's ok.

Copy link
Contributor Author

@miwelc miwelc Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, between calls to sbrk some other code calling sbrk on its own can interfere. dlmalloc detects this in some cases (I can think of at least one case where it doesn't and seems like a bug to me with security implications) and sets non-contiguous mode from then on.

Dividing this in two calls increases the risk of this happenning and even if we fix dlmalloc to detect it in all cases, not being able to trim wastes space (considering we are talking about HALF_SIZE_T allocations makes it worse).

I'd rather keep changes as minimal as possible, so that it's easier to update dlmalloc to a new version in the future and to minimize the risk of introducing a bug. But it's your call.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I wonder if this actually saves a few bytes on codesize too?

Can you run ./test/runner other.*code_size* other.*metadce* --rebase to see if anything changes? (To know for sure you really need to run that command twice, once before you change to get a new baseline, and then once after you change).

system/lib/dlmalloc.c Outdated Show resolved Hide resolved
@sbc100
Copy link
Collaborator

sbc100 commented Oct 13, 2022

Thanks for working on this, excited to see if we save some codesize too .. I think maybe the whole of sys_trim might be dead code for most programs after this change!

@miwelc
Copy link
Contributor Author

miwelc commented Oct 14, 2022

Nice!

I wonder if this actually saves a few bytes on codesize too?

Can you run ./test/runner other.*code_size* other.*metadce* --rebase to see if anything changes? (To know for sure you really need to run that command twice, once before you change to get a new baseline, and then once after you change).

I've run it but I don't see any changes at all. I've also had to skip one test with EMTEST_SKIP_NODE I don't know why, as far I know NODE_JS and JS_ENGINES are correctly set.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 14, 2022

Nice!
I wonder if this actually saves a few bytes on codesize too?
Can you run ./test/runner other.*code_size* other.*metadce* --rebase to see if anything changes? (To know for sure you really need to run that command twice, once before you change to get a new baseline, and then once after you change).

I've run it but I don't see any changes at all. I've also had to skip one test with EMTEST_SKIP_NODE I don't know why, as far I know NODE_JS and JS_ENGINES are correctly set.

Thats is very odd. Can you attach you .emscripten file? You will need node configured in order to run those tests.

@brendandahl
Copy link
Collaborator

Worth noting that the tests expect a specific version of node 14.18.2_64bit. If you install emsdk I'd recommend pointing it at the version it installs in emsdk/node/14.18.2_64bit/bin/node.

@miwelc
Copy link
Contributor Author

miwelc commented Oct 14, 2022

Thats is very odd. Can you attach you .emscripten file? You will need node configured in order to run those tests.

Here is my .emscripten file:

import os
emsdk_path = "/Users/miwelc/Development/emsdk"
NODE_JS = emsdk_path + '/node/14.18.2_64bit/bin/node'
PYTHON = emsdk_path + '/python/3.9.2_64bit/bin/python3'
LLVM_ROOT = emsdk_path + '/upstream/bin'
BINARYEN_ROOT = emsdk_path + '/upstream'
EMSCRIPTEN_ROOT = "/Users/miwelc/emscripten"
COMPILER_ENGINE = NODE_JS
JS_ENGINES = [NODE_JS]

It's the same config I use when using Emscripten to compile my projects, but with EMSCRIPTEN_ROOT changed. Am I doing something wrong?

@sbc100
Copy link
Collaborator

sbc100 commented Oct 14, 2022

Thats is very odd. Can you attach you .emscripten file? You will need node configured in order to run those tests.

Here is my .emscripten file:

import os
emsdk_path = "/Users/miwelc/Development/emsdk"
NODE_JS = emsdk_path + '/node/14.18.2_64bit/bin/node'
PYTHON = emsdk_path + '/python/3.9.2_64bit/bin/python3'
LLVM_ROOT = emsdk_path + '/upstream/bin'
BINARYEN_ROOT = emsdk_path + '/upstream'
EMSCRIPTEN_ROOT = "/Users/miwelc/emscripten"
COMPILER_ENGINE = NODE_JS
JS_ENGINES = [NODE_JS]

It's the same config I use when using Emscripten to compile my projects, but with EMSCRIPTEN_ROOT changed. Am I doing something wrong?

I'm pretty sure EMSCRIPTEN_ROOT doesn't do anything anymore.. you can remove that line.

Is this .emscripten inside of the your emscripten directory? (if not where is it?). I recommend putting it inside the emscripten checkout, or alternatively you can just set EM_CONFIG to point to the one in the emsdk directory.

@miwelc
Copy link
Contributor Author

miwelc commented Oct 14, 2022

Thats is very odd. Can you attach you .emscripten file? You will need node configured in order to run those tests.

Here is my .emscripten file:

import os
emsdk_path = "/Users/miwelc/Development/emsdk"
NODE_JS = emsdk_path + '/node/14.18.2_64bit/bin/node'
PYTHON = emsdk_path + '/python/3.9.2_64bit/bin/python3'
LLVM_ROOT = emsdk_path + '/upstream/bin'
BINARYEN_ROOT = emsdk_path + '/upstream'
EMSCRIPTEN_ROOT = "/Users/miwelc/emscripten"
COMPILER_ENGINE = NODE_JS
JS_ENGINES = [NODE_JS]

It's the same config I use when using Emscripten to compile my projects, but with EMSCRIPTEN_ROOT changed. Am I doing something wrong?

I'm pretty sure EMSCRIPTEN_ROOT doesn't do anything anymore.. you can remove that line.

Is this .emscripten inside of the your emscripten directory? (if not where is it?). I recommend putting it inside the emscripten checkout, or alternatively you can just set EM_CONFIG to point to the one in the emsdk directory.

Yes, it's inside my emscripten directory. I've tried both export EM_CONFIG='/Users/miwelc/emscripten/.emscripten' and export EM_CONFIG='/Users/miwelc/Development/emsdk/.emscripten' with no luck. It does find the file, because if I set a wrong path it errors on that immediately.

On a happier note, if I remove emscripten/cache between runs now I get consistent size reduction of about 70 bytes in the tests.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! But can we write a test for this?

@kripken
Copy link
Member

kripken commented Oct 17, 2022

For a test, the best place is likely the browser test suite. It's safest there since the tests run sequentially, so there is no risk of OOMing the entire test runner. We prefix those tests with zzz_zzz to ensure they run at the very end (to minimize the risk of OOM). See this example test which you can use as starting point:

# Tests that emmalloc supports up to 4GB Wasm heaps.
@no_firefox('no 4GB support yet')
def test_zzz_zzz_emmalloc_4gb(self):
self.btest(test_file('mem_growth.cpp'),
expected='-65536', # == 4*1024*1024*1024 - 65536 casted to signed
args=['-sMALLOC=emmalloc', '-sABORTING_MALLOC=0', '-sALLOW_MEMORY_GROWTH=1', '-sMAXIMUM_MEMORY=4GB'])

system/lib/dlmalloc.c Outdated Show resolved Hide resolved
@miwelc
Copy link
Contributor Author

miwelc commented Oct 18, 2022

Sure, I'll write a test. Thanks for the example test!

Also, out of curiosity, will it be possible in the future to shrink the wasm module's ArrayBuffer? This PR is built on the assumption that Emscripten's sbrk receives an unsigned parameter. In the future, if we want to support shrinking wasm memory it will be easily supported in wasm64 by setting UNDEFINED_MORECORE to 0 but for wasm32 it will need a different approach or making that option mutually exclusive with allocating chunks bigger than 2GB.

We've discussed requesting memory in a loop when >2GB and I don't think the extra binary size and complexity of the required error handling is worth it now but in the scenario of wanting to support this use case at the same time as supporting giving memory back to the browser for wasm32... this should be revisited.

@miwelc
Copy link
Contributor Author

miwelc commented Oct 20, 2022

The new test seems to be passing but CI fails with too many unresponsive tests. I've reused an existing test for emmalloc testing 3GB allocations.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! And good idea about the test, it is best to reuse that existing one.

One question I'm still thinking about is what our plan is for upgrades to dlmalloc in the future. Perhaps having a section with a list of our changes to it would be useful. Or we could mark them using // XXX EMSCRIPTEN as we've done in other parts of the system library. But the git history might be good enough, and I don't think that needs to block - we can improve that later if we want - so let's land this.

Thanks for working on this @miwelc !

@kripken kripken merged commit 00daf40 into emscripten-core:main Oct 21, 2022
@sbc100
Copy link
Collaborator

sbc100 commented Oct 24, 2022

Great! And good idea about the test, it is best to reuse that existing one.

One question I'm still thinking about is what our plan is for upgrades to dlmalloc in the future.

I don't think dlmalloc is being actively worked on so I think that is quite unlikely.

@kleisauke
Copy link
Collaborator

I don't think dlmalloc is being actively worked on so I think that is quite unlikely.

FWIW, musl v1.2.1 replaced their original dlmalloc-like allocator with a new "mallocng" malloc implementation, since the former suffered from fundamental design problems. See the v1.2.1 release notes.

Unfortunately, as mentioned in #12833 (comment), it depends on the availability of an OS-level mmap API. Perhaps the memory-control proposal could help with that in the future.

@kripken
Copy link
Member

kripken commented Oct 24, 2022

I see, thanks @sbc100 @kleisauke

@tiran
Copy link
Contributor

tiran commented Oct 26, 2022

I believe this change set broke over-allocation on WASM64. One of my Python test cases now fails with an assertion error assert(requestedSize > oldSize); in emscripten_resize_heap. The test case attempts to allocate 0x7fffffffffffffff and assumes an allocation error.

test_constructor (test.test_io.CBufferedReaderTest.test_constructor) ... 
Aborted(Assertion failed)
exiting due to exception: RuntimeError: Aborted(Assertion failed),RuntimeError: Aborted(Assertion failed)
    at abort (/cpython/builddir/wasm64-emscripten-node-debug/python.js:1010:11)
    at assert (/cpython/builddir/wasm64-emscripten-node-debug/python.js:485:5)
    at _emscripten_resize_heap (/cpython/builddir/wasm64-emscripten-node-debug/python.js:7125:7)
    at sbrk (wasm://wasm/02557b1e:wasm-function[10612]:0x52c2d0)
    at dlmalloc (wasm://wasm/02557b1e:wasm-function[10603]:0x529d0e)
    at _PyMem_RawMalloc (wasm://wasm/02557b1e:wasm-function[2018]:0xbc92a)
    at PyMem_Malloc (wasm://wasm/02557b1e:wasm-function[2042]:0xbe3ea)
    at _buffered_init (wasm://wasm/02557b1e:wasm-function[7035]:0x315dae)
    at _io_BufferedReader___init__ (wasm://wasm/02557b1e:wasm-function[7034]:0x315cf6)
    at wrap_init (wasm://wasm/02557b1e:wasm-function[2351]:0xd9028)

The values are

requestedSize 20324352 oldSize 20971520

This reproducer passes on 3.1.24 and aborts on tot-upstream (3.1.25-git):

// emcc -sMEMORY64 -sASSERTIONS -sALLOW_MEMORY_GROWTH t.c -o t.js
#include <stdlib.h>

int main(void) {
    malloc(0x7fffffffffffffff);
    malloc(0x7fffffffffffffff);
}

@miwelc
Copy link
Contributor Author

miwelc commented Oct 26, 2022

These changes simply remove the check that ensured sbrk was only called with values smaller than half size_t (e.g. (requested_size + other_internal_data_and_padding) < HALF_MAX_SIZE_T). If I'm not mistaken in your test you are requesting exactly half size_t which previously would silently fail returning 0 and no allocation happened (and neither sbrk or emscripten_resize_heap were actually called). In your test, your first malloc was already failing. If you change it so that you request the same total memory but in 3 smaller chunks it should fail with a previous Emscripten version.

I'm not sure but I think the issue is due to sbrk only checking for 32-bit overflow in

if (increment > 0 && (uint32_t)new_brk <= (uint32_t)old_brk) {

All of this is just theory, I'll look into it and get back to you.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 26, 2022

It looks like that code in sbrk.c should be wrapped in #ifdef __wasm32__. Im not sure if that will fix this issue but it looks like it would be more correct.

@miwelc
Copy link
Contributor Author

miwelc commented Oct 26, 2022

After some testing, it turns out it's just that emscripten_resize_heap() is receiving a correct size but bigger than what a JS Number can store as integer. If I call it with 2^53-1 it doesn't abort (but it correctly complains that the requested size is bigger than the limit). If I call it with 2^53 it aborts.

In summary:

  • The bug fixed by this PR prevented the test from actually testing what intended.
  • Now valid values reach sbrk and emscripten_resize_heap but the latter is not able to load correctly the value in JS.

Also, I've been testing the overflow check mentioned before and it seems there is actually a bug in that code although a different one:

// wasm64
uintptr_t old_brk = (uintptr_t)0xFFFFFFFF; // 2^31
uintptr_t new_brk = old_brk + 10;
if((uint32_t)new_brk <= (uint32_t)old_brk) {
    // Condition IS met even if new_brk is valid
}

uintptr_t old_brk = (uintptr_t)0xFFFFFFFF + 1; // 2^31 + 1
uintptr_t new_brk = old_brk + 10;
if((uint32_t)new_brk <= (uint32_t)old_brk) {
    // Condition is NOT met
}

uintptr_t old_brk = (uintptr_t)0xFFFFFFFF; // 2^31
uintptr_t new_brk = old_brk + 10;
if(new_brk <= old_brk) {
    // Condition is NOT met
}

So it turns out that if old_brk is some multiple of 2^31, it detects a false overflow.
I recommend just removing those (uint32_t) casts as these variables are declared as uintptr_t which is uint32_t under wasm32 and uint64_t under wasm64 and that code would automatically work for both environments.

@miwelc
Copy link
Contributor Author

miwelc commented Oct 31, 2022

So it turns out that if old_brk is some multiple of 2^31, it detects a false overflow. I recommend just removing those (uint32_t) casts as these variables are declared as uintptr_t which is uint32_t under wasm32 and uint64_t under wasm64 and that code would automatically work for both environments.

@sbc100 Should I open a PR fixing this bug?

@sbc100
Copy link
Collaborator

sbc100 commented Oct 31, 2022

Yes please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants