Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use wasm-sync to allow usage on the main browser thread #1110

Merged
merged 12 commits into from
Jan 17, 2024

Conversation

RReverser
Copy link
Contributor

@RReverser RReverser commented Dec 25, 2023

One of the most common complaints I've been receiving in wasm-bindgen-rayon that prevents people from using Rayon on the Web is the complexity of manually splitting up the code that uses Rayon into a Web Worker from code that drives the UI.

It requires custom message passing for proxying between two threads (Workers), which, admittedly, feels particularly silly when using a tool that is meant to simplify working with threads for you.

This all stems from a Wasm limitation that disallows atomic.wait on the main browser thread. In theory, it's a reasonable limitation, since blocking main thread on the web is more problematic than on other platforms as it blocks web app's UI from being responsive altogether, and because there is no limit on how long atomic wait can block. In practice, however, it causes enough issues for users that various toolchains - even Emscripten - work around this issue by spin-locking when on the main thread.

Rust / wasm-bindgen decided not to adopt the same workaround, following the core Wasm decision, which is also a fair stance for general implementation of Mutex and other blocking primitives, but I believe Rayon usecase is sufficiently special. Code using parallel iterators is almost always guaranteed to run for less or, worst-case, ~same time as code using regular iterators, so it doesn't make sense to "punish" Rayon users and prevent them from being able to use parallel iterators on the main thread when it will lead to a more responsive UI than using regular iterators.

This PR adds a cfg-conditional dependency on wasm_sync that automatically switches to the allowed spin-based Mutex and Condvar when it detects it's running on the main thread, and to regular std::sync based implementation otherwise, thus avoiding the atomics.wait error. This dependency will only be added when building for wasm32-unknown-unknown - that is, not affecting WASI and Emscripten users - and only when building with -C target-feature=+atomics, so not affecting users who rely on Rayon's single-threaded fallback mode either. I hope this kind of very limited override will be acceptable as it makes it much easier to use Rayon on the web.

When this is merged, I'll be able to leverage it in wasm-bindgen-rayon and significantly simplify demos, tests and docs by avoiding that extra Worker machinery (e.g. see demo/wasm-worker.js and demo/index.js merged into single simple JS file in the linked diff).

@RReverser RReverser changed the title Use wasm-sync to allow blocking on the main thread with wasm-bindgen-rayon Use wasm-sync to allow usage on the main Web thread Dec 25, 2023
@RReverser RReverser changed the title Use wasm-sync to allow usage on the main Web thread Use wasm-sync to allow usage on the main browser thread Dec 25, 2023
Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current failure mode when it tries to block on the main web thread? Some kind of panic/abort of the wasm module?

rayon-core/src/lib.rs Outdated Show resolved Hide resolved
src/result.rs Outdated Show resolved Hide resolved
@RReverser
Copy link
Contributor Author

RReverser commented Dec 30, 2023

Some kind of panic/abort of the wasm module?

Yeah. More specifically, Wasm engine itself throws an error on the instruction, so it's not even something one can catch & handle from Wasm.

One of the most common complaints I've been receiving in [wasm-bindgen-rayon](https://github.com/RReverser/wasm-bindgen-rayon) that prevents people from using Rayon on the Web is the complexity of manually splitting up the code that uses Rayon into a Web Worker from code that drives the UI.

It requires custom message passing for proxying between two threads (Workers), which, admittedly, feels particularly silly when using a tool that is meant to simplify working with threads for you.

This all stems from a [Wasm limitation](WebAssembly/threads#177) that disallows `atomic.wait` on the main browser thread. In theory, it's a reasonable limitation, since blocking main thread on the web is more problematic than on other platforms as it blocks web app's UI from being responsive altogether, and because there is no limit on how long atomic wait can block. In practice, however, it causes enough issues for users that various toolchains - even Emscripten - work around this issue by spin-locking when on the main thread.

Rust / wasm-bindgen decided not to adopt the same workaround, following general Wasm limitation, which is also a fair stance for general implementation of `Mutex` and other blocking primitives, but I believe Rayon usecase is quite different. Code using parallel iterators is almost always guaranteed to run for less or, worst-case, ~same time as code using regular iterators, so it doesn't make sense to "punish" Rayon users and prevent them from being able to use parallel iterators on the main thread when it will lead to a _more_ responsive UI than using regular iterators.

This PR adds a `cfg`-conditional dependency on [wasm_sync](https://docs.rs/wasm_sync/latest/wasm_sync/) that automatically switches to allowed spin-based `Mutex` and `Condvar` when it detects it's running on the main thread, and to regular `std::sync` based implementation otherwise, thus avoiding the `atomics.wait` error. This dependency will only be added when building for `wasm32-unknown-unknown` - that is, not affecting WASI and Emscripten users - and only when building with `-C target-feature=+atomics`, so not affecting users who rely on Rayon's single-threaded fallback mode either. I hope this kind of very limited override will be acceptable as it makes it much easier to use Rayon on the web.

When this is merged, I'll be able to leverage it in wasm-bindgen-rayon and [significantly simplify](https://github.com/RReverser/wasm-bindgen-rayon/compare/main...RReverser:wasm-bindgen-rayon:wasm-sync?expand=1) demos, tests and docs by avoiding that extra Worker machinery (e.g. see `demo/wasm-worker.js` and `demo/index.js` merged into single simple JS file in the linked diff).
@cuviper
Copy link
Member

cuviper commented Jan 10, 2024

Would it be possible to test this in CI?

@RReverser
Copy link
Contributor Author

Would it be possible to test this in CI?

I am testing this on CI on the wasm-bindgen-rayon side, but I was also wondering if we should move / add CI covering usage with wasm-bindgen-rayon for rayon too, given that you have a more comprehensive test suite...

Running tests requires some plumbing - as in, regular #[test] doesn't "just work" with wasm-bindgen, and I wasn't sure if you'd be open to having to maintain that in the main rayon repo, but if it's something you're interested in, I can try and add such plumbing in a follow-up PR.

@cuviper
Copy link
Member

cuviper commented Jan 11, 2024

If runtime in CI is complicated, then maybe just a build test is enough. I suppose that needs -Zbuild-std for atomics?

@RReverser
Copy link
Contributor Author

I suppose that needs -Zbuild-std for atomics?

Just for a build test - I think not necessarily, enabling target-feature for Rayon itself (via -C target-features) should be enough. Then the standard library won't work properly in multithreaded environment (e.g. things like allocator won't be thread-safe), but as long as you only build and don't try to run it, it should be enough.

@RReverser
Copy link
Contributor Author

I'll try to add such basic check.

@RReverser
Copy link
Contributor Author

Heh made a mistake of trying to add wasm32-wasi-preview1-threads to CI too while at it. Big mistake - apparently that target is not quite there yet, as failing with random failures like unaligned atomic access or accessing memory out of Wasm bounds.

Will stick to the original scope instead.

@RReverser
Copy link
Contributor Author

@cuviper Done. I split out WASI CI from Wasm CI since they do different things, and because I need nightly only for wasm32-unknown-unknown (otherwise target-features is ignored).

Btw, I'm curious - why are Wasm sections set to only run on merge_group workflow, but not on PR workflows? Seems like there's a risk that a PR could break them w/o noticing? For example, those workflows don't run on this PR - I had to create a separate branch to trigger CI on my own fork to verify that they actually work (https://github.com/RReverser/rayon/actions/runs/7504219725).

@cuviper
Copy link
Member

cuviper commented Jan 12, 2024

Btw, I'm curious - why are Wasm sections set to only run on merge_group workflow, but not on PR workflows? Seems like there's a risk that a PR could break them w/o noticing?

The idea was to keep the PR CI fast like rust-lang/rust does, since most changes are not target specific, and the macos/windows builders are noticeably slower. The merge queue is supposed to make sure that we do get a full run so we'll notice before any target-specific break actually reaches the repo.

But these days, maybe those slower builders are still fast enough that we don't need to bother splitting...

.github/workflows/ci.yaml Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
@RReverser
Copy link
Contributor Author

Hm one more thought: I wonder if it would be better to require an explicit Cargo feature for this, rather than assuming that all wasm32-unknown-unknown users will use it with wasm-bindgen. For example, getrandom does this with js feature and some other crates use wasm-bindgen feature.

It would be a bit more verbose, but perhaps explicit opt-in is better.

@RReverser
Copy link
Contributor Author

For example, getrandom does this with js feature and some other crates use wasm-bindgen feature.

It would be a bit more verbose, but perhaps explicit opt-in is better.

OTOH that doesn't really enable JS support like in other crates - you still need wasm-bindgen-rayon as well. I guess it should be called browser_lock or something to clarify that it only takes care of locking and only useful in browsers.

@cuviper
Copy link
Member

cuviper commented Jan 15, 2024

Maybe just leave it directly as wasm-sync like regular optional dependencies? Or do you think there will be other related functionality that may want to piggy-back on this?

@RReverser
Copy link
Contributor Author

RReverser commented Jan 15, 2024

Or do you think there will be other related functionality that may want to piggy-back on this?

Not sure about this one yet, but:

  1. I wondered if we might need to switch from wasm_sync to another implementation in the future, in which case using its name directly might not be future-proof.
  2. More importantly, rayon's wasm_sync would need to enable rayon-core's wasm_sync. This is possible with regular Cargo features, but AFAIK it's not possible to do so from a feature auto-generated from an optional dependency - as in, you can't tell Cargo that it's both an optional dependency and it needs to enable extra features.

@RReverser
Copy link
Contributor Author

I added an explicit web_spin_lock feature instead. It should be more future-proof and makes cfg(...) conditions cleaner too. Let me know what you think.

@RReverser
Copy link
Contributor Author

RReverser commented Jan 15, 2024

  1. I wondered if we might need to switch from wasm_sync to another implementation in the future, in which case using its name directly might not be future-proof.

In fact, I already wonder if I should use spin instead, as it's more generic (any no_std target), has more lock datastructures (e.g. Once which wasm_sync doesn't have), and seems much more popular / better maintained... Although, unlike wasm_sync, it's incompatible with std primitives as it follows lock_api instead - e.g. mutex.lock() doesn't return a Result but a guard directly.

I can investigate and switch in a follow-up PR though, for now this PR is already a big improvement over previous state of art and would rather not block it on further investigations.

Cargo.toml Outdated Show resolved Hide resolved
rayon-core/Cargo.toml Outdated Show resolved Hide resolved
@cuviper
Copy link
Member

cuviper commented Jan 15, 2024

I already wonder if I should use spin instead,

It's significant that wasm-sync only spins on the main thread though -- I don't think we want the whole thread pool spinning. Even the basic idle sleep uses a condvar to wait for work.

@RReverser
Copy link
Contributor Author

It's significant that wasm-sync only spins on the main thread though -- I don't think we want the whole thread pool spinning. Even the basic idle sleep uses a condvar to wait for work.

Yeah that's true. I raised an issue on wasm-sync suggesting they could switch to spin under the hood - this should preserve this benefit while making it easier to add more primitives like e.g. Once and OnceLock that are currently missing.

@RReverser
Copy link
Contributor Author

FWIW this PR should be good to go.

@cuviper
Copy link
Member

cuviper commented Jan 17, 2024

Thanks!

@cuviper cuviper added this pull request to the merge queue Jan 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jan 17, 2024
@RReverser
Copy link
Contributor Author

I guess macOS CI just hung up?

@cuviper
Copy link
Member

cuviper commented Jan 17, 2024

Hmm, let's try again.

@cuviper cuviper added this pull request to the merge queue Jan 17, 2024
Merged via the queue into rayon-rs:master with commit b03b68c Jan 17, 2024
4 checks passed
@RReverser RReverser deleted the wasm-sync branch January 17, 2024 17:27
RReverser added a commit to RReverser/wasm-bindgen-rayon that referenced this pull request Jan 17, 2024
Starting with rayon 1.8.1 / rayon-core 1.12.1, Rayon has a web_spin_lock feature powered by wasm-sync that allows blocking on the main thread via spinning - same workaround for forbidden `atomics.wait` as used in e.g. Emscripten. rayon-rs/rayon#1110

We can leverage it and simplify instructions, tests and the demo to avoid an extra worker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants