Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-proposal: memory.discard #6

Open
bvisness opened this issue Feb 15, 2023 · 7 comments
Open

Sub-proposal: memory.discard #6

bvisness opened this issue Feb 15, 2023 · 7 comments

Comments

@bvisness
Copy link
Collaborator

bvisness commented Feb 15, 2023

This issue tracks discussion of the memory.discard sub-proposal. This proposal is one possible component of the eventual Memory Control MVP. Please make sure to read the full sub-proposal document.

This issue previously tracked discussion of the SpiderMonkey team's proposal of an older sketch of memory.discard. To avoid losing context, this issue has been repurposed. The original discussion starts below.


We on the SpiderMonkey team have created a prototype of memory.discard from this proposal. We've also created a document exploring the design space and the results of our research.

https://docs.google.com/document/d/131BCKkMSdKCeU51XJ52mIy5O4i60CIGMopSh8aWNQG8/edit?usp=sharing

Our prototype is available behind a flag in Firefox Nightly - you can try it here. It does not yet support shared memories, as we’re still researching how to handle this on Windows.

I look forward to everyone's comments and suggestions! This instruction feels good to us so far.

@dtig
Copy link
Member

dtig commented Feb 15, 2023

Thanks @bvisness for filing this!

The V8 prototype currently has an API only function for memory.discard(numPages) with a similar API to memory.grow(numPages) which only decommits the number of pages from the end of committed memory. This is definitely less flexible than the approach outlined in the doc, so it's great to have a couple of approaches to experiment with. One of the motivations for this was to keep it in line with JS RABs/GSABs, where ResizableArrayBuffers are allowed to shrink, but SharedArrayBuffers are only allowed to grow.

Explicit cc for @dschuff because we had previously discussed having a couple of different toolchain flags that support both the approaches so this can be experimented with.

There's a few questions in this design space that might be interesting to discuss:

  • What happens when the memory that's discarded is accessed again? In the case where the length remains constant I assume this is equivalent to reading zero-filled memory, and non-trapping?
  • I have a strong preference for these instructions to be aligned to Wasm memory pages, simply because relaxing that is somewhat hairy for our bookkeeping. For the traps currently implemented, how are they surfaced to JS?

@titzer
Copy link
Contributor

titzer commented Feb 15, 2023

+1 to page-alignment for these operations. I don't think we want to introduce a new instruction that is redundant with memory.fill[1], rather, an instruction specifically for pages, which gives applications more explicit control.

[1] AFAICT, an engine could transparently drop pages for very large memory.fill(0) requests.

@eqrion
Copy link
Contributor

eqrion commented Feb 15, 2023

There's a few questions in this design space that might be interesting to discuss:

  • What happens when the memory that's discarded is accessed again? In the case where the length remains constant I assume this is equivalent to reading zero-filled memory, and non-trapping?

Yes, it's completely valid to implement this as filling the whole region with zero bytes. This proposed design does not change the length of the memory at all, or introduce regions that will trap when accessed.

Behind the scenes, we're using OS specific API's which 'essentially' replace the OS pages with lazy-allocated zero pages. This reduces memory pressure on the system until the pages are accessed again. On some platforms the pages are re-allocated on read, others on write. This is pretty platform specific, and the most details are on the linked docs.

  • I have a strong preference for these instructions to be aligned to Wasm memory pages, simply because relaxing that is somewhat hairy for our bookkeeping.

I agree that keeping the base/length byte lengths aligned to wasm pages is simplest. The only argument I could see for relaxing this somehow is if wasm pages are too big that users rarely ever have that much address space unused and ready to free, especially with heap fragmentation. In that case, it'd be beneficial to have some lower granularity that is closer to the OS page size. But this has other issues and we shouldn't consider this unless we hear it would help from users.

For the traps currently implemented, how are they surfaced to JS?

If the base/length are not aligned to wasm page size, then we emit a WebAssembly.RuntimeError. We do this for both the instruction and JS-API entry points.

@bvisness
Copy link
Collaborator Author

bvisness commented Mar 28, 2023

We have now implemented memory.discard for shared memories in SpiderMonkey. The document has been updated with our findings and recommendations, and our demo now supports shared memories as well.

Windows makes this difficult, since none of the available virtual memory APIs directly allow us to replace working set pages with demand-zero pages in a single call. However, using memset(0) and VirtualUnlock() achieves the desired result.

At this point, we'll try out our memory.discard instruction in some real programs and see what kinds of improvements we get.

@bvisness
Copy link
Collaborator Author

bvisness commented Jan 10, 2025

To address a comment by @BlobTheKat on WebAssembly/design#1543:

To make full use of this, we could create new memory with 65536 pages (4GB, the full address space) and physical pages would only be allocated as accessed, and freed on clearPages

The question with such a strategy is whether the memory would initially be committed or not. (This applies primarily to Windows, but also to systems with overcommit disabled.) By "committed" I generally mean "mapped as read-write", such that accessing the memory will not cause a signal or exception.

If the memory is committed by default, then physical memory use stays low (RSS), but the system may still fail the memory allocation due to a system commit limit. This limit can actually be quite low; on Windows I believe it defaults to three times the size of physical memory. That means that committing 4GB on a system with 4GB of RAM would require a full third of the system-wide commit limit, and is likely to fail. On mobile in particular these limits can be drastically lower still.

On the other hand, if memory is not committed by default, the 4GB reservation is likely to succeed, but accessing these pages now triggers an exception. If you wish to preserve wasm's current semantics and get zeroes instead of trapping, then you need to implement "software page faults", committing pages inside a signal handler and resuming execution of your wasm program manually. This is not impossible, but certainly is clumsy, and likely considerably slower than letting the operating system handle the page fault.

@lukewagner has suggested that you could pair memory.discard with a memory.warmup instruction, or some such, which would pre-commit pages that you expect to access, avoiding the software page faults. This also could work, but honestly, how much of the operating system's memory manager do we want to rebuild inside of wasm engines? Will we expect engines to also implement a memory prefetcher that commits pages that are accessed sequentially? Will we need a memory.advise like Linux's madvise that can change the mode of the prefetcher (e.g. memory.advise random, memory.advise sequential)? To me it seems far more prudent to build abstractions that can take advantage of the operating system, not recreate it.

I think that using the full 4GB address space (or larger address spaces with memory64) would be best done with an explicit virtual mode for memory, as described here. memory.discard on its own still seems like a very useful and portable addition to plain linear memory, though, and it would be wonderful to see some experiments testing its effectiveness in real programs.

@BlobTheKat
Copy link

This is not impossible, but certainly is clumsy, and likely considerably slower than letting the operating system handle the page fault.

I do see what you mean by clumsy. The performance cost on the other hand is not significant: exceptions do not happen frequently, and the handler simply hands back to the OS to add the page. The only overhead over page allocation the traditional way is the intermediate user-space step that immediately hands back to the OS

@bvisness
Copy link
Collaborator Author

Exceptions would happen frequently in this scheme: every time a page is touched for the first time. That said, at this point we do still need to measure the overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants