-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Background thread for automatic device polling #1891
Conversation
I've fixed the compile error in |
FWIW, as part of my upcoming changes, we'll be able to run maintain in multiple threads at once for the same queue as well as run it independently per-queue, which I think would conflict somewhat with your change. IMO something like this is not really a substitute for a well-integrated runtime, wgpu should ideally provide some sort of runtime integration rather than just spawning a background thread like this (plus, in many applications, spawning background threads can be quite detrimental to performance, especially if they make heavy use of thread locals, so I'm worried about integrating this too deeply into wgpu). For an example of what I mean, look at how flexible rayon / crossbeam integration are, where the user has total control over the thread pool, the workers, etc. |
Can you link your WIP? Then I could have a look.
How would this work? The fundamental problem is that there's no way to have the GPU driver call a function when some operation completes, so the application itself needs to block on/poll a fence and call the future resolution. The best and most efficient way to do this is in a background thread.
Well, first, wgpu and its dependencies already spawn a lot of background threads before it even starts doing anything useful (~15 for the GL and ~25 for the Vulkan backend, on my system), so another thread can't possibly hurt too much. Second, the background thread in this PR will probably never use more than its initial page of stack memory, and it is almost always idle, except for two points in time: when it is woken up and starts blocking on the device, and when the device finishes its work and the thread wakes up the blocked futures. The resources consumed by this thread are absolutely negligible compared to anything a non-trivial wgpu application will use. Third, it's not deeply integrated into wgpu at all: The creation of the background thread is completely contained within the
Applications which want and/or need total control over everything won't use wgpu anyway, but Vulkan or DirectX directly. And even if they do, it's very easy to opt-out of creating the background thread. |
So, any updates on this? Yes/no/maybe? |
Sorry about the delay! I was hoping that @grovesNL can make a call on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a great start for a basic runtime, and allows futures to work roughly the the same way across web and native whenever auto-poll is enabled. I think it would be good to proceed with this.
As @xq-tec mentioned, it's easy to opt-out of this to manually control polling in the same way it currently works. We could also add direct integration with runtimes later on if we'd like (e.g. reusing an existing thread pool provided by another crate), or more detailed thread control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few high-level questions that I find critical:
- why is this limited to buffer mapping? There are other async operations, like on_submitted_work_done.
- does this have to be integrated into wgpu-core at all? Can the solution live exclusively in wgpu-rs instead?
- in line with @pythonesque reasoning, would this API be compatible with other run-times, like winit's event loop?
Because I didn't think of it, to be honest. It should be easy to support
I think there is one way to do it without wgpu-core. It would involve adding an |
I guess I'm missing something conceptually. Why does there need to be a callback from wgpu-core at all? wgpu-core can't kick off events by itself, it only does so via |
Because But I think there's a way around this: If we store the callback closure in |
ok, why do we need to call any closures on |
The closure wakes up the background thread, which then executes the equvalent of |
I'm quite confused. So the background thread parks itself and waits, only getting woken up on that signal. When is this signal sent? On |
When the background thread is woken up, it executes |
I see. So, if There is a problem, however. If mapping buffer is always blocking the device (using the buffer.map_async(...);
// these operations are going to be blocked, since the device is busy waiting
let texture = device.create_texture(...);
queue.submit(...); So we end up in a strange situation, where on one hand there is a background thread that's meant to provide asynchronuity. But on the other hand we are still effectively blocked on the main thread. |
Wait, so when Is there a way around this? I.e., would it be possible to wait on the "most recent fence" from one thread, but allow pushing to the queue on other threads? |
I'm double-checking now. Maintain() locks the following:
I know this is highly unfortunate. The hub locking was never meant to be long term. Device polling with wait is a hack that probably needs to be removed entirely. But as it stands now, the set of restrictions is pretty darn big, and it's very likely that the user code steps on one of them. |
Thanks for the explanation, @kvark! I guess I'll wait until @pythonesque's changes land before working on this problem again. |
Would it be possible to split wpgu_core let (device_guard, mut token) = hub.devices.read(&mut token);
let mut life_tracker = self.lock_life(token);
triage_suspected(), triage_mapped() ...
// Drop locks:
drop(device_guard); drop(life_tracker);
// Blocking wait:
self.raw.wait(...)...
// Re-acquire locks:
let (device_guard, mut token) = hub.devices.read(&mut token);
let mut life_tracker = self.lock_life(token);
triage_submissions(), handle_mapping(), etc.
return closures; This would solve the loss of asynchronicity due to |
this Although technically we could make wgpu-hal expose some way of producing an independent "Waiter" object that one could use. Ideally, we'd get @pythonesque changes instead |
Connections
See #1871 which discusses the general problem and solution.
Description
The PR spawns an optional background thread which polls the device when a buffer is mapped. This enables more idiomatic code:
instead of
This solution adds a closure to
wgpu_core::device::Device
, which is called when a buffer mapping is set up. When awgpu::Device
is created, this closure is optionally set to a function which triggers a device poll on a background thread.