-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use sbrk(0) to determine the initial heap size #377
Conversation
This commit changes the `try_init_allocator` function as part of dlmalloc to not use `sbrk(0)` to determine the initial heap size. The purpose of this function is to use the extra memory at the end of linear memory for the initial allocation heap before `memory.grow` is used to allocate more memory. To learn the extent of this region the code previously would use `sbrk(0)` to find the current size of linear memory. This does not work, however, when other systems have called `memory.grow` before this function is called. For example if another allocator is used or if another component of a wasm binary grows memory for its own purposes then that memory will be incorrectly claimed to be owned by dlmalloc. Instead this commit rounds up the `__heap_base` address to the nearest page size, since that must be allocatable. Otherwise anything above this rounded address is assumed to be used by something else, even if it's addressable.
it breaks the case where initial heap is more than 1 page, doesn't it? |
This doesn't break that use case, but it does make it less efficient. The default output of LLD doesn't ever have more than one page at the end of memory (or at least not that I'm aware of), and I'm not sure if such a setup is commonly used in the wild. If it is then I think it would perhaps make sense to add a new pseudo-symbol to LLD such as |
Note there's Also, re avoiding collision with userland |
So is |
Yes I've updated this to prioritize using Note @TerrorJack that using a ctor is also not going to work here since ctors are not run as part of the wasm |
IIUIC |
Yes that is distinct from the |
That's fair. Maybe a bit offtopic but I do wonder why the ctors aren't put into the module
Sounds like undefined behavior to call anything before |
Ah excellent suggestions @sbc100, updated to include those.
Personally I don't consider that a great stance for wasi-libc to take. That's a pretty onerous restriction to work around when relatively simple lazy initialization, such as what happens in dlmalloc right now, fixes the issue. |
There is a long discussion on this here: WebAssembly/design#1160 TLDR: Calling host functions normally requires exports to have been received by the host (e.g. the host can do much until is has a handle to the memory export). Exports are not received by the host until after the start function runs, therefore its not possible, in the general case, to call host functions during start function. |
For what its worth the wasm-ld linker will use the start function for some things if it knows that code is only doing internal stuff (i.e. loading memory segments, applying relocations). its only if user code is involved that we must wait until after start has run. |
The approach of using |
Update to wasi-libc a1c7c2c7a4b2813c6f67bd2ef6e0f430d31cebad - Don't use sbrk(0) to determine the initial heap size (WebAssembly/wasi-libc#377) - Fix more headers to avoid depending on `max_align_t` (WebAssembly/wasi-libc#375) - Use `ENOENT` rather than `ENOTCAPABLE` for missing preopens. (WebAssembly/wasi-libc#370) - Adjust Makefile for LLVM trunk (16) as of 2022-11-08 (WebAssembly/wasi-libc#344)
Shouldn't the same thing be done for |
Avoid using sbrk(0) in emmalloc too.
Update to wasi-libc a1c7c2c7a4b2813c6f67bd2ef6e0f430d31cebad - Don't use sbrk(0) to determine the initial heap size (WebAssembly/wasi-libc#377) - Fix more headers to avoid depending on `max_align_t` (WebAssembly/wasi-libc#375) - Use `ENOENT` rather than `ENOTCAPABLE` for missing preopens. (WebAssembly/wasi-libc#370) - Adjust Makefile for LLVM trunk (16) as of 2022-11-08 (WebAssembly/wasi-libc#344)
i feel you have a different definition of
while it might not be too common setup, i occasionally use such a setup. |
LLVM has added a Does wasm-ld have some option which enables extra patches to be allocated after |
I'll reiterate again that these are just my own personal thoughts, and I'm not really a core maintainer here so my thoughts I don't think should really hold all that much weight for maintaining wasi-libc. I don't think that the old behavior is correct, to me it's broken. To me that's enough reason to preserve the old behavior because I don't think that a bug should be preserved just because it's been there for awhile. Looking at the history this behavior was specifically added in #114 so while it's pretty old at this point it may not be so long as "wasi-libc never allowed this". Regardless though, in my opinion, how old it is I don't think should have a bearing on fixing the underlying bug. I again don't want to appear like I'm diminishing or brushing away use cases here. My point is that the use case of someone else calling Again though I don't maintain wasi-libc, I just contribute on occasion. If this patch is reverted I'll find other ways to solve my problems, so I don't think it necessarily needs my own personal approval. |
my understanding of the situation is same. my preference is the opposite though. |
This is somewhat of a new use case to me. I had always imagined that |
The symbol was introduced in LLD 15.0.7, as a way to know how much memory can be allocated: llvm/llvm-project@1095870 WebAssembly/wasi-libc#377
The symbol was introduced in LLD 15.0.7, as a way to know how much memory can be allocated: llvm/llvm-project@1095870 WebAssembly/wasi-libc#377
LLVM 15.0.7 is now released, with the fix. |
I believe the current status of wasi-libc is that:
I personally think wasi-libc's malloc should not assume aggressive ownership of the entire address space. There might be multiple allocators in play or other custom embedding logic which uses memory and allocates via |
I must be misunderstanding, I though that point of this PR was to avoid exactly this case. After this PR doesn't wasi-libc avoid using this region?
Are you sure about this? Looking at dlmalloc.c it seems that we currently build with Unless I'm misunderstanding what |
Oh, maybe
|
Sorry, I should clarify I'm describing the pre-this-PR behavior since that's also what I think you're talking about. As for libc's behavior, I determined this empirically: use std::alloc;
use std::arch::wasm32::memory_grow;
const PAGESIZE: usize = 64 * 1024;
#[no_mangle]
extern "C" fn entry() {
// 0x110000 - before first malloc
println!("{:#x}", memory_grow::<0>(1) * PAGESIZE);
// 0x102600 - note that this extends into the grown region
println!("{:p}", allocate(PAGESIZE));
// 0x120000 - grow after first malloc
println!("{:#x}", memory_grow::<0>(1) * PAGESIZE);
// 0x130010 - note that this does not conflict with second `memory.grow`
println!("{:p}", allocate(PAGESIZE));
}
fn allocate(amt: usize) -> *mut u8 {
unsafe { alloc::alloc(alloc::Layout::from_size_align(amt, 8).unwrap()) }
} This prints:
Here the return value of I also saw |
we are all fine with |
* Update llvm-project to the latest release/15.x This pulls in the `__heap_end` symbol, which fixes the issue discussed in WebAssembly/wasi-libc#377. * Update to the official 15.0.7 release.
This commit updates the wasi-libc revision used to build with the wasm32-wasi target. This notably pulls in WebAssembly/wasi-libc#377 which is needed to fix a use case I've been working on recently. This should be a relatively small update hopefully and is not expected to have any user impact.
…, r=cuviper Update the wasi-libc used for the wasm32-wasi target This commit updates the wasi-libc revision used to build with the wasm32-wasi target. This notably pulls in WebAssembly/wasi-libc#377 which is needed to fix a use case I've been working on recently. This should be a relatively small update hopefully and is not expected to have any user impact.
…, r=cuviper Update the wasi-libc used for the wasm32-wasi target This commit updates the wasi-libc revision used to build with the wasm32-wasi target. This notably pulls in WebAssembly/wasi-libc#377 which is needed to fix a use case I've been working on recently. This should be a relatively small update hopefully and is not expected to have any user impact.
…, r=cuviper Update the wasi-libc used for the wasm32-wasi target This commit updates the wasi-libc revision used to build with the wasm32-wasi target. This notably pulls in WebAssembly/wasi-libc#377 which is needed to fix a use case I've been working on recently. This should be a relatively small update hopefully and is not expected to have any user impact.
The symbol was introduced in LLD 15.0.7, as a way to know how much memory can be allocated: llvm/llvm-project@1095870 WebAssembly/wasi-libc#377
i submitted a partial revert #386 |
This commit effectively drops the support of older wasm-ld. (LLVM <15) We have two relevant use cases: * `memory.grow` use outside of malloc (eg. used by polyfill preview1 binaries) * `--init-memory` to somehow preallocate heap (eg. avoid dynamic allocations, especially on small environments) While WebAssembly#377 fixed the former, it broke the latter if you are using an older LLVM, which doesn't provide the `__heap_end` symbol, to link your module. As we couldn't come up with a solution which satisfies all parties, this commit simply makes it require new enough LLVM which provides `__heap_end`. After all, a link-time failure is more friendly to users than failing later in a subtle way.
This commit effectively drops the support of older wasm-ld. (LLVM <15.0.7) We have two relevant use cases: * `memory.grow` use outside of malloc (eg. used by polyfill preview1 binaries) * `--init-memory` to somehow preallocate heap (eg. avoid dynamic allocations, especially on small environments) While WebAssembly#377 fixed the former, it broke the latter if you are using an older LLVM, which doesn't provide the `__heap_end` symbol, to link your module. As we couldn't come up with a solution which satisfies all parties, this commit simply makes it require new enough LLVM which provides `__heap_end`. After all, a link-time failure is more friendly to users than failing later in a subtle way.
* Don't use sbrk(0) to determine the initial heap size This commit changes the `try_init_allocator` function as part of dlmalloc to not use `sbrk(0)` to determine the initial heap size. The purpose of this function is to use the extra memory at the end of linear memory for the initial allocation heap before `memory.grow` is used to allocate more memory. To learn the extent of this region the code previously would use `sbrk(0)` to find the current size of linear memory. This does not work, however, when other systems have called `memory.grow` before this function is called. For example if another allocator is used or if another component of a wasm binary grows memory for its own purposes then that memory will be incorrectly claimed to be owned by dlmalloc. Instead this commit rounds up the `__heap_base` address to the nearest page size, since that must be allocatable. Otherwise anything above this rounded address is assumed to be used by something else, even if it's addressable. * Use `__heap_end` if defined * Move mstate initialization earlier
This commit effectively drops the support of older wasm-ld. (LLVM <15.0.7) We have two relevant use cases: * `memory.grow` use outside of malloc (eg. used by polyfill preview1 binaries) * `--init-memory` to somehow preallocate heap (eg. avoid dynamic allocations, especially on small environments) While WebAssembly#377 fixed the former, it broke the latter if you are using an older LLVM, which doesn't provide the `__heap_end` symbol, to link your module. As we couldn't come up with a solution which satisfies all parties, this commit simply makes it require new enough LLVM which provides `__heap_end`. After all, a link-time failure is more friendly to users than failing later in a subtle way.
This commit effectively drops the support of older wasm-ld. (LLVM <15.0.7). We have two relevant use cases: * `memory.grow` use outside of malloc (eg. used by polyfill preview1 binaries) * `--init-memory` to somehow preallocate heap (eg. avoid dynamic allocations, especially on small environments) While #377 fixed the former, it broke the latter if you are using an older LLVM, which doesn't provide the `__heap_end` symbol, to link your module. As we couldn't come up with a solution which satisfies all parties, this commit simply makes it require new enough LLVM which provides `__heap_end`. After all, a link-time failure is more friendly to users than failing later in a subtle way.
This commit changes the
try_init_allocator
function as part of dlmalloc to not usesbrk(0)
to determine the initial heap size. The purpose of this function is to use the extra memory at the end of linear memory for the initial allocation heap beforememory.grow
is used to allocate more memory. To learn the extent of this region the code previously would usesbrk(0)
to find the current size of linear memory. This does not work, however, when other systems have calledmemory.grow
before this function is called. For example if another allocator is used or if another component of a wasm binary grows memory for its own purposes then that memory will be incorrectly claimed to be owned by dlmalloc.Instead this commit rounds up the
__heap_base
address to the nearest page size, since that must be allocatable. Otherwise anything above this rounded address is assumed to be used by something else, even if it's addressable.