Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocating a >2GB pointer only seems to work when MALLOC='emmalloc' flag is set. Why isn't this documented anywhere? #17747

Closed
rotemdan opened this issue Aug 27, 2022 · 12 comments

Comments

@rotemdan
Copy link

rotemdan commented Aug 27, 2022

I'm using latest emscripten (3.1.20), node x64 16.17.0 on Win10 (x64).
emcc compilation is executed within WSL2 (Ubuntu 22.04.1 LTS).

I've spent some time trying to get my code (written in C) to allocate a single large matrix that may be several gigabytes in size (nothing else of significant size is allocated prior to this).

Using flags:

-s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s MAXIMUM_MEMORY=4GB

Didn't work when the allocated size was greater than 2GB.

After some trial and error. Going as deep as inspecting the source files, I found an option that changes the allocator from the default dlmalloc to emmaloc. Once I changed the command line to:

-s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s MAXIMUM_MEMORY=4GB -s MALLOC='emmalloc'

it worked.

Is this intended behavior? Why isn't this documented anywhere?

The most prominent Google result is a post on the V8 blog that mentions the MAXIMUM_MEMORY flag, but doesn't say anything about MALLOC?

@rotemdan rotemdan changed the title Allocating a >2GB pointer only seems works when MALLOC='emmalloc' flag is set. Why isn't this documented anywhere? Allocating a >2GB pointer only seems to works when MALLOC='emmalloc' flag is set. Why isn't this documented anywhere? Aug 27, 2022
@rotemdan rotemdan changed the title Allocating a >2GB pointer only seems to works when MALLOC='emmalloc' flag is set. Why isn't this documented anywhere? Allocating a >2GB pointer only seems to work when MALLOC='emmalloc' flag is set. Why isn't this documented anywhere? Aug 27, 2022
@juj
Copy link
Collaborator

juj commented Aug 28, 2022

This certainly reads like a bug. Both emmalloc and the default dlmalloc allocators should work with >2GB allocations to my understanding. Not sure why dlmalloc would fail on this, since it is a very "old" allocator used in lots of projects, and haven't ever heard that it would have a limitation like this by design. Maybe there is something off with how Emscripten interacts with dlmalloc that does not work quite right. Minimal repros welcome (does it fail on a straightforward int main() { malloc(3*1024*1024*1024); } call?)

Furthermore, In WASM64 mode they should both also work with >4GB allocations (maybe not yet today since Wasm64 is still in development, but modulo any current bugs or missing items there)

@rotemdan
Copy link
Author

rotemdan commented Aug 28, 2022

Yes, I've tried to simplify the reproduction, but only using JavaScript (no calls at all to WASM methods). I've exported _malloc by adding the compiler flag: EXPORTED_FUNCTIONS="['_malloc']" and called it directly from JavaScript:

const m = await initializeWasmModule() // code to initialize the module is omitted

const ptr = m._malloc(2147473648)
console.log(ptr)

ptr is nonzero with the MALLOC='emmalloc' flag and zero without.

Reducing the size allocated from 2147483648 (2^31) to 2147473648 (2^31 - 10000) now succeeds with both dlmalloc and emmalloc.

Since I'm running on Windows there's a possibility this is somehow platform related, but I'm not sure. I haven't yet tried on other platforms.

Edit: with emmalloc I was able to allocate up to 4294957296 (2^32 - 10000) bytes.

@rotemdan
Copy link
Author

rotemdan commented Aug 28, 2022

I was able to simplify and reproduce this in WSL2 Linux (Ubuntu 22.04.1 LTS).

Here's an (almost) minimal reproduction case:

Makefile:

PROJ=wasm-allocation-test

EMCCFLAGS=-s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s MAXIMUM_MEMORY=4GB -s MALLOC='dlmalloc' -s EXPORTED_FUNCTIONS="['_malloc']" -s MODULARIZE=1 -s EXPORT_NAME="TestModule"

all: empty.c
	emcc $(EMCCFLAGS) -O0 -o $(PROJ).js $^

.PHONY: clean
clean:
	rm -f $(PROJ).js $(PROJ).wasm

test.js:

async function testLargeAllocation() {
    const instance = (await import("./wasm-allocation-test.js")).default;
    const m = await instance()

    const ptr = m._malloc(2147483648 + 1000000);
	
    console.log(ptr);
}

testLargeAllocation()

empty.c (empty source file):

It exhibits the same behavoir (fails with dlmalloc, succeeds with emmalloc)

@miwelc
Copy link
Contributor

miwelc commented Oct 5, 2022

I also have this issue. I'm able to allocate more than 2GB if I do it by requesting several <=2GB chunks but I'm not able to allocate a single >2GB chunk. Everything works fine if I use emmalloc instead of dlmalloc. I'm using 3.1.23.

@miwelc
Copy link
Contributor

miwelc commented Oct 7, 2022

After some more testing, it seems like this issue only happens if the heap needs to be resized during the malloc call. If it's already larger than 2GB (e.g. by allocating and then freeing several < 2GB chunks), then I'm able to call malloc with > 2GB successfully.

@kripken
Copy link
Member

kripken commented Oct 7, 2022

@miwelc Interesting. Could be a bug in dlmalloc then in a code path that grows memory using sbrk. It won't do that if you've already allocated over 2GB.

Debugging into dlmalloc is one option (maybe for code paths around sbrk). Another might be to see if upgrading dlmalloc fixes this - it could be a fixed bug.

@miwelc
Copy link
Contributor

miwelc commented Oct 10, 2022

After some digging in dlmalloc source code, I've seen that all allocations that rely on sbrk have this guard: ssize < HALF_MAX_SIZE_T. That macro is defined as:

#define MAX_SIZE_T           (~(size_t)0)
#define HALF_MAX_SIZE_T     (MAX_SIZE_T / 2U)

If I'm not mistaken, sizeof(size_t) is 4 bytes so HALF_MAX_SIZE_T is 2GB.
I don't know the reasoning behind this check but I think this is the origin of the issue.

@miwelc
Copy link
Contributor

miwelc commented Oct 13, 2022

I've actually tested this and confirmed that by changing HALF_MAX_SIZE_T definition to MAX_SIZE_T everything works as expected. I've also found why dlmalloc does that, it's described in the source code:

The type of the argument to sbrk/MORECORE varies across
 systems.  It cannot be size_t, because it supports negative
 arguments, so it is normally the signed type of the same width as
 size_t (sometimes declared as "intptr_t").  It doesn't much matter
 though. Internally, we only call it with arguments less than half
 the max value of a size_t, which should work across all reasonable
 possibilities, although sometimes generating compiler warnings.

If Emscripten's sbrk supports values greater than 2GB (unsigned) as it's the case, it should be safe to just change HALF_MAX_SIZE_T definition to fix this issue.

@kripken
Copy link
Member

kripken commented Oct 13, 2022

Good find @miwelc !

Yes, emscripten's sbrk() supports values greater than 2GB, so we should enable that flag. A PR would be very welcome!

@miwelc
Copy link
Contributor

miwelc commented Oct 13, 2022

Sure, I'll open a PR.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 13, 2022

The argument to sbrk is currently signed, but we immediately cast it to unsigned:

uintptr_t increment = (uintptr_t)increment_;

So yes it should be safe to use the full 32bits of range for the increment.

I'm not sure I understand the It cannot be size_t, because it supports negative arguments, so it is normally the signed type of the same width as size_t (sometimes declared as "intptr_t") argument.. since as far as I can tell from all the docs sbrk cannot be used to lower the brk pointer.

Regardless, shouldn't dlmalloc just call sbrk() multiple times in this case? It shouldn't matter if dlmalloc internally things it needs to do multiple sbrk calls should it?

@miwelc
Copy link
Contributor

miwelc commented Oct 13, 2022

We've moved the conversation to the PR (#18055) but in order to keep last question answered here:
As far as I can see, dlmalloc doesn't do multiple calls when ssize < HALF_MAX_SIZE_T. The problem is not that it doesn't get enough continuous memory, is that it doesn't even call sbrk in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants