-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThreadPool::install and allocation overhead #666
Comments
cc @stjepang -- is it expected that |
I just did some digging, this is what I found out so far:
|
Btw, I used this application let pool = ThreadPoolBuilder::new().num_threads(4).build().unwrap();
pool.install(|| {});
pool.install(|| {});
println!("Waiting 1 second");
sleep(Duration::new(1, 0));
for i in 0..10 {
pool.install(|| {})
} which printed
for |
Oh, indeed -- |
Some more digging: First Allocation Source (LockLatch)
However, there is still another allocation source: Second Allocation Source (SegQueue::push)
|
The |
Yes, got it to work now with the original Do you plan to work on this yourself, or do you want me to clean up and submit a PR for the |
I was tinkering a little, but if you want to submit a proper PR, that would be welcome. Note that I think we don't want to affect |
Ok, I now managed to remove the remaining allocation source. I first looked into crossbeam's I then realized that I could just combine an However, if I understand it correclty the In any case, I created a PR #670 for you to review and discuss. If I combine #670 and #668 I'm not seeing any allocation anymore in my test case. |
668: Reusing LockLatch to reduce allocation r=cuviper a=ralfbiedert This addresses the first half of #666. I kept the `LockLatch` and added a `reset()` method because that had overall fewer lines of code. If you want I can still change it to a separate `ReusableLockLatch` type. Co-authored-by: Ralf Biedert <[email protected]> Co-authored-by: Josh Stone <[email protected]>
Background
We are using rayon within a library that is called from game engines in the main loop, at about 90 FPS, depending on the device. From the game engine we receive a number of objects which we transform with some math-heavy calculations.
We have used rayon for some time now, and from a performance / usability perspective it hits a sweep spot where we can simplify our code a lot, and see a good speed up over a single threaded approach.
However, as we are getting into more demanding environments we need to control for allocation, and want to be more-or-less allocation free after initialization.
Problem
I started analyzing our allocation pattern and noticed that
ThreadPool::install
does about 2 allocations per invocation, at about 10 - 20 bytes each (after the 1st round):(Source) - rayon 1.1.0
Ideally,
ThreadPool::install
should be allocation free eventually, and / or there should be an alternative API that allow for allocation free processing. In the other thread #614 a heapless(?)StackJob
was mentioned, but I couldn't find anything in the docs.The text was updated successfully, but these errors were encountered: