-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dlmalloc for allocations bigger than 2GB #18055
Conversation
I've introduced unwanted changes due to whitespace removal. I'll fix it. |
system/lib/dlmalloc.c
Outdated
@@ -1678,7 +1678,12 @@ extern size_t getpagesize(); | |||
#define TWO_SIZE_T_SIZES (SIZE_T_SIZE<<1) | |||
#define FOUR_SIZE_T_SIZES (SIZE_T_SIZE<<2) | |||
#define SIX_SIZE_T_SIZES (FOUR_SIZE_T_SIZES+TWO_SIZE_T_SIZES) | |||
#if __EMSCRIPTEN__ | |||
/* Emscripten's sbrk can interpret unsigned values greater than (MAX_SIZE_T / 2U) (2GB) correctly */ | |||
#define HALF_MAX_SIZE_T (MAX_SIZE_T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But clearly the value of this macro is now incorrect, and dlmalloc was not designed to have HALF_MAX_SIZE_T == MAX_SIZE_T on any system.
Does does dlmalloc deal with requests for memory that are larger than HALF_MAX_SIZE_T on other systems? Shouldn't it just call sbrk multiple times in this case to get enough continuous memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, dlmalloc doesn't do multiple calls when ssize < HALF_MAX_SIZE_T
. The problem is not that it doesn't get enough continuous memory, is that it doesn't even call sbrk in the first place.
You are right, HALF_MAX_SIZE_T is not MAX_SIZE_T. But that macro is only used with the purpose of checking the parameter passed to sbrk()
. It is the minimal change required to fix this issue and that's why I opted for it.
I can change all conditionals instead to ssize < MAX_SIZE_T
or remove them since they would become always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a fundamental limit of dlmalloc? It cannot allocate anything larger than half of size_t? At least not in MORECORE
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. As the comment in their source code says, they limit this because sbrk parameter is signed. I don't know if that is documented anywhere but I've checked it and in MacOS, for example, sbrk is defined as void *sbrk(int);
in unistd.h under some conditions at least (it seems they later changed it to intprt_t
). As you've said, this only affects MORECORE
mode. When using mmap
this works as expected.
But Emscripten's sbrk doesn't have that interface or at least interprets unsigned values greater than HALF_MAX_SIZE_T
correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to patch dlmalloc like this perhaps we want to introduce another setting which completely ignores HALF_MAX_SIZE_T? i.e. in this mode we could not defined HALF_MAX_SIZE_T at all and #ifdef out all the places it is checked?
How about UNLIMITED_MORECORE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. I think that would work. I assume dlmalloc is being careful here and not assuming it can call sbrk
twice and have something make changes in between, but for us that is a safe assumption: only sbrk
will modify the memory size, and we lock around dlmalloc so even pthreads builds should be ok. The only risk I can think of is someone calling emscripten_resize_heap
in between on another thread, but I think even that might work (since that function takes the final requested size, not a delta).
Perhaps the safe thing is to call in a loop but assert on later iterations seeing that nothing changed in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe dlmalloc is already built with that assumption:
MORECORE_CONTIGUOUS default: 1 (true) if HAVE_MORECORE
If true, take advantage of fact that consecutive calls to MORECORE
with positive arguments always return contiguous increasing
addresses. This is true of unix sbrk. It does not hurt too much to
set it true anyway, since malloc copes with non-contiguities.
Setting it false when definitely non-contiguous saves time
and possibly wasted space it would take to discover this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think emscripten_resize_heap
changes the sbrk pointer... only sbrk can do that. emscripten_resize_heap
just resizes the memory. It should really be called emscripten_resize_memory
.. just like all HEAPU8
and friends should really have been called MEMU8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, but MORECORE_CONTIGUOUS
isn't enough, I think, since someone else can call it in between your calls? You would allocate [A, B)
, they would allocate [B, C)
, and you'd then allocate [C, D)
, but your two allocations are not contiguous even though from sbrk
's point of view they are. But probably I'm missing something...
Good point about emscripten_resize_heap
, I think that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, between calls to sbrk some other code calling sbrk on its own can interfere. dlmalloc detects this in some cases (I can think of at least one case where it doesn't and seems like a bug to me with security implications) and sets non-contiguous mode from then on.
Dividing this in two calls increases the risk of this happenning and even if we fix dlmalloc to detect it in all cases, not being able to trim wastes space (considering we are talking about HALF_SIZE_T allocations makes it worse).
I'd rather keep changes as minimal as possible, so that it's easier to update dlmalloc to a new version in the future and to minimize the risk of introducing a bug. But it's your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I wonder if this actually saves a few bytes on codesize too?
Can you run ./test/runner other.*code_size* other.*metadce* --rebase
to see if anything changes? (To know for sure you really need to run that command twice, once before you change to get a new baseline, and then once after you change).
Thanks for working on this, excited to see if we save some codesize too .. I think maybe the whole of |
… even more memory by error
I've run it but I don't see any changes at all. I've also had to skip one test with |
Thats is very odd. Can you attach you |
Worth noting that the tests expect a specific version of node |
Here is my import os
emsdk_path = "/Users/miwelc/Development/emsdk"
NODE_JS = emsdk_path + '/node/14.18.2_64bit/bin/node'
PYTHON = emsdk_path + '/python/3.9.2_64bit/bin/python3'
LLVM_ROOT = emsdk_path + '/upstream/bin'
BINARYEN_ROOT = emsdk_path + '/upstream'
EMSCRIPTEN_ROOT = "/Users/miwelc/emscripten"
COMPILER_ENGINE = NODE_JS
JS_ENGINES = [NODE_JS] It's the same config I use when using Emscripten to compile my projects, but with |
I'm pretty sure Is this |
Yes, it's inside my emscripten directory. I've tried both On a happier note, if I remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! But can we write a test for this?
For a test, the best place is likely the browser test suite. It's safest there since the tests run sequentially, so there is no risk of OOMing the entire test runner. We prefix those tests with emscripten/test/test_browser.py Lines 5296 to 5301 in 03ccf3a
|
Sure, I'll write a test. Thanks for the example test! Also, out of curiosity, will it be possible in the future to shrink the wasm module's ArrayBuffer? This PR is built on the assumption that Emscripten's We've discussed requesting memory in a loop when >2GB and I don't think the extra binary size and complexity of the required error handling is worth it now but in the scenario of wanting to support this use case at the same time as supporting giving memory back to the browser for wasm32... this should be revisited. |
The new test seems to be passing but CI fails with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! And good idea about the test, it is best to reuse that existing one.
One question I'm still thinking about is what our plan is for upgrades to dlmalloc in the future. Perhaps having a section with a list of our changes to it would be useful. Or we could mark them using // XXX EMSCRIPTEN
as we've done in other parts of the system library. But the git history might be good enough, and I don't think that needs to block - we can improve that later if we want - so let's land this.
Thanks for working on this @miwelc !
I don't think dlmalloc is being actively worked on so I think that is quite unlikely. |
FWIW, musl v1.2.1 replaced their original dlmalloc-like allocator with a new "mallocng" malloc implementation, since the former suffered from fundamental design problems. See the v1.2.1 release notes. Unfortunately, as mentioned in #12833 (comment), it depends on the availability of an OS-level |
I see, thanks @sbc100 @kleisauke |
I believe this change set broke over-allocation on WASM64. One of my Python test cases now fails with an assertion error
The values are
This reproducer passes on 3.1.24 and aborts on tot-upstream (3.1.25-git):
|
These changes simply remove the check that ensured I'm not sure but I think the issue is due to Line 74 in 6153cb3
All of this is just theory, I'll look into it and get back to you. |
It looks like that code in sbrk.c should be wrapped in |
After some testing, it turns out it's just that In summary:
Also, I've been testing the overflow check mentioned before and it seems there is actually a bug in that code although a different one: // wasm64
uintptr_t old_brk = (uintptr_t)0xFFFFFFFF; // 2^31
uintptr_t new_brk = old_brk + 10;
if((uint32_t)new_brk <= (uint32_t)old_brk) {
// Condition IS met even if new_brk is valid
}
uintptr_t old_brk = (uintptr_t)0xFFFFFFFF + 1; // 2^31 + 1
uintptr_t new_brk = old_brk + 10;
if((uint32_t)new_brk <= (uint32_t)old_brk) {
// Condition is NOT met
}
uintptr_t old_brk = (uintptr_t)0xFFFFFFFF; // 2^31
uintptr_t new_brk = old_brk + 10;
if(new_brk <= old_brk) {
// Condition is NOT met
} So it turns out that if old_brk is some multiple of 2^31, it detects a false overflow. |
@sbc100 Should I open a PR fixing this bug? |
Yes please! |
Background of this PR: #17747