-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up green task spawning #12172
Merged
Merged
Speed up green task spawning #12172
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I also think that there are only 4 allocations remaining:
|
Very promising results. |
🤘 I'd love to benchmark this against raw pthread stack spawning. |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this pull request
Feb 12, 2014
Currently, a scheduler will hit epoll() or kqueue() at the end of *every task*. The reason is that the scheduler will context switch back to the scheduler task, terminate the previous task, and then return from run_sched_once. In doing so, the scheduler will poll for any active I/O. This shows up painfully in benchmarks that have no I/O at all. For example, this benchmark: for _ in range(0, 1000000) { spawn(proc() {}); } In this benchmark, the scheduler is currently wasting a good chunk of its time hitting epoll() when there's always active work to be done (run with RUST_THREADS=1). This patch uses the previous two commits to alter the scheduler's behavior to only return from run_sched_once if no work could be found when trying really really hard. If there is active I/O, this commit will perform the same as before, falling back to epoll() to check for I/O completion (to not starve I/O tasks). In the benchmark above, I got the following numbers: 12.554s on today's master 3.861s with rust-lang#12172 applied 2.261s with both this and rust-lang#12172 applied cc rust-lang#8341
The condition was the wrong direction and it also didn't take equality into account. Tests were added for both cases. For the small benchmark of `task::try(proc() {}).unwrap()`, this takes the iteration time on OSX from 15119 ns/iter to 6179 ns/iter (timed with RUST_THREADS=1) cc rust-lang#11389
One of these is allocated for every task, trying to cut down on allocations cc rust-lang#11389
Instead, use an enum to allow running both a procedure and sending the task result over a channel. I expect the common case to be sending on a channel (e.g. task::try), so don't require an extra allocation in the common case. cc rust-lang#11389
Two unfortunate allocations were wrapping a proc() in a proc() with GreenTask::build_start_wrapper, and then boxing this proc in a ~proc() inside of Context::new(). Both of these allocations were a direct result from two conditions: 1. The Context::new() function has a nice api of taking a procedure argument to start up a new context with. This inherently required an allocation by build_start_wrapper because extra code needed to be run around the edges of a user-provided proc() for a new task. 2. The initial bootstrap code only understood how to pass one argument to the next function. By modifying the assembly and entry points to understand more than one argument, more information is passed through in registers instead of allocating a pointer-sized context. This is sadly where I end up throwing mips under a bus because I have no idea what's going on in the mips context switching code and don't know how to modify it. Closes rust-lang#7767 cc rust-lang#11389
bors
added a commit
that referenced
this pull request
Feb 14, 2014
These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks! The program I used to benchmark is at the end. It was compiled with `rustc --opt-level=3 bench.rs --test` and run as `RUST_THREADS=1 ./bench --bench`. I chose to use `RUST_THREADS=1` due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by `--bench` (lower is better). | | green | native | raw | | ------------- | ----- | ------ | ------ | | osx before | 12699 | 24030 | 19734 | | linux before | 10223 | 125983 | 122647 | | osx after | 3847 | 25771 | 20835 | | linux after | 2631 | 135398 | 122765 | Note that this is *not* a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is *clearly* benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower. All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack. ```rust extern mod extra; extern mod native; use std::rt::thread::Thread; #[bench] fn green(bh: &mut extra::test::BenchHarness) { let (p, c) = SharedChan::new(); bh.iter(|| { let c = c.clone(); spawn(proc() { c.send(()); }); p.recv(); }); } #[bench] fn native(bh: &mut extra::test::BenchHarness) { let (p, c) = SharedChan::new(); bh.iter(|| { let c = c.clone(); native::task::spawn(proc() { c.send(()); }); p.recv(); }); } #[bench] fn raw(bh: &mut extra::test::BenchHarness) { bh.iter(|| { Thread::start(proc() {}).join() }); } ```
flip1995
pushed a commit
to flip1995/rust
that referenced
this pull request
Jan 25, 2024
no_effect_underscore_binding: _ prefixed variables can be used Prefixing a variable with a `_` does not mean that it will not be used. If such a variable is used later, do not warn about the fact that its initialization does not have a side effect as this is fine. changelog: [`no_effect_underscore_binding`]: warn only if variable is unused Fix rust-lang#12166
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks!
The program I used to benchmark is at the end. It was compiled with
rustc --opt-level=3 bench.rs --test
and run asRUST_THREADS=1 ./bench --bench
. I chose to useRUST_THREADS=1
due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by--bench
(lower is better).Note that this is not a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is clearly benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower.
All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack.