-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial proposal for webgpu.h #1
Conversation
webgpu.h
Outdated
#endif | ||
|
||
// Procs of BindGroup | ||
typedef void (*wgpuProcBindGroupReference)(wgpuBindGroup bindGroup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be a part of the API header? Looks like an implementation detail to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every WebGPU implementation will need to implement something like this to avoid freeing data in use by the GPU, and having a refcount helps composing together software that uses webgpu.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have those procs, and we don't free data in use by GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear. I expect every WebGPU implementation will have internal refcounting to avoid freeing data on the GPU (and I see wgpu-rs have ref_count even though I'm not familiar enough with Rust to understand all of what's happening there). So why not expose this to the application since it helps compose things together?
This is also slightly closer to the JS semantics with GC keeping the object alive as long as there is a reference to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't considered exposing the refcounts to the user:
- for JS, it already knows when to call destroy()
- for Rust, we use ownership and borrowing instead of refcounts on the API side
Perhaps, you could do it as an extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since WebGPU objects are internally synchronized doesn't it make sense to have them implement Clone + Sync
and this would kinda require a refcount? (I don't know Rust too well so maybe this is stupid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Device
in WebGPU is truly internally synchronized.
In general, if something is internally synchronized, that means Sync
is fine, but Clone
has nothing to do with it. If a user wants to share the Device
between multiple threads, they just wrap it in Arc<Device>
and go from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kvark, then doesn't it make sense to simply expose objects in Arc<T>
form directly? I.e. constructors would only initialize an Arc<Device>
instead of Device
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Arc
forces the way refcounting is done, which doesn't match what Dawn is doing (it's C++ and the refcount is at an offset while it is at offset 0 for Arc
) nor what wgpu-native
because of the HUB
structure they use.
That said the same Arc
semantics could be done with a Sync + Clone
object that uses webgpu.h
's refcounting mechanisms to implement Clone
.
webgpu.h
Outdated
|
||
wgpuProcFenceGetCompletedValue fenceGetCompletedValue; | ||
wgpuProcFenceOnCompletion fenceOnCompletion; | ||
wgpuProcFenceRefe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move all the refcounting stuff into a separate header, even if it's hosted alongside webgpu.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unified them instead to not be per device but on WGPUObject
which is void*
Thank you for addressing my concerns! |
Had a bit of discussion on the call about refcount exposure.
Cons:
I'm slightly concerned about the "exposing costs us nothing" aspect: it's not like we can straight expose our internal refcounts. Our implementation would likely still need to track the user refcounts separately from the internal ones (because we'd need to know when the user one reaches 0, at least), at which point we end up with just implementing an extra refcounting exposed to the user, which is nothing like "exposing costs us nothing" implies. |
Why do you need to keep the internal refcount separate from the internal one? We were doing this initially in Dawn but merged both refcounts because with RAII wrappers the application can't mistakenly deref too much. |
How do you rely on RAII wrappers if it's the C API being exposed? |
Sure, wrappers in high level languages can have this guarantee provided. But we are exposing the C API, we aren't requiring the users to use the wrappers. In the C API we can't guarantee that the issued refcount bumps/decs are sane. |
What's the goal of having separate refcounts? I don't think we should validate the refcounting: if an application does too many derefs and the internal refcount is 0, then there will be a use-after-free and a potential segfault. We can't validate derefs and guarantee 100% that the application can't segfault itself on the first invalid deref. So you might as well have a single internal+external refcount. (in Dawn I'm thinking that maybe in debug we could split them, but even then that's only useful for automated testing). |
It comes do to what is the "Valid Usage" of the API encoded in the header. Providing a malformed pointer in place of a struct is surely a crash, but at least it's very localized: you know exactly when and how things went wrong. The "validitity" of a refcount decrease operation isn't local: it depends on all the previous ups and downs issued. So it's possible to describe, but it's much less desirable to expose an API that is almost safe, minus this one case where you'd not be able to tell instantly what the problem is (because it's somewhere deep in the logic of your app, out of a sudden). Of all the build targets we need to support, look at where exposing refcounts is useful:
Weighting these concerns, I feel like exposing the refcounts would bring more trouble than they are worth? |
I don't think we should strive for Rust's level of safety in a C API because it doesn't have any support for linear types that would allow encoding such constraints.
Same thing when there is no refcounting happening: if you use a destroyed object, you might still have correct behavior (or a validation error instead of a crash) because the object is still internally referenced by the WebGPU implementation.
Graphics APIs are optimized for the Release usecases so I don't think we should worry about this.
The shim would have to implement the refcount, but it doesn't mean it's useless. The same arguments about refcounts being useful for software composability apply. |
Removed the procTable as was requested, and wrapped the API function declarations in a block so they can be skipped. We probably want a getProcAddress-like function in a follow-up PR. I tried implementing the "unified" refcount today to get Dawn closer to the proposed header. Turns out unified refcount is worse than having a reference/release for each object because it essentially forces implementations to do a dynamic dispatch based on the object type when For the WASM shim this isn't a problem because AFAIK the reference types are all pointing into a single reference table so objects are already unified there. The cases where we would like to do some type-specific work is if we want to call Then the question is do we want refcount reaching 0, (or explicit destroy) to act as if
So irrespective of refcount vs. no refcount, we should split the CPU object lifetime management from the equivalent of Speaking of Web engines, they will need some form of refcounting of WGPU objects as soon as they implement sharing objects between workers. |
The table is implemented in JS at least for now, so we can implement it however we want. Emscripten's GL object tables are per-type. |
Thanks for the extensive responses!
It's not as much about safety as it is about a general "good API". When the user gets undesired behavior, a good API makes it obvious to what needs to be fixed. That's what the "locality" gives you: user searches for a place where they destroy the object and try to figure out if it's not the right place to destroy.
We should strive to hide the implementation details from the user, if we want the API to be portable. The way I see it, if the user calls
The question here is not about optimization. It's about what API we expose for lifetime management.
The shim should be as light as possible though. Users who don't need refcounting (or want them implemented differently, say, by introducing weak references, thread-local counters, etc) shouldn't be paying the implementation cost for them. If we implement refcounting in the WASM shim, will it be strictly better than anything the users may do on their side? As for the reasons to separate "destroy" from refcounting semantics, this one is least clear to me:
I'm confused because there appears to be some context implied here that needs to be specified, e.g. for "combined destruction of the state tracking and freeing of the GPU memory". Let me try to bring some clarity. About "destroy" - it doesn't make sense to actually destroy anything at this stage, since the implementation may have dependent work in flight. Forcing the user to know whether there is work in flight (e.g. with fences) is an option, but that's not the direction WebGPU group has taken. So "destroy" simply means "I don't want this object any more, please free associated resources as soon as you can". This is pretty much exactly the same semantics as reaching the refcount of 0. So at the end of the day, we are trying to figure out what of the two APIs would work better:
The (2) can't be specified to invalidate the handles at any point, since the condition of reaching 0 is run-time dependent. The (2) has more APIs, potentially more redundant ones ("destroy" versus "remove_ref"), which are making it more difficult for the users to deal with lifetime issues (and correspondingly easier to leak objects/memory). |
FWIW, I noticed that refcounting allows the API to more closely match WebGPU: const t = device.createTexture(...);
commands.push(makeSomeCommands(t));
// t.destroy() here would cause submit to fail later
...
queue.submit(commands);
// no need to destroy with refcounting: WGPUTexture t = wgpuDeviceCreateTexture(device, ...);
commands.push_back(makeSomeCommands(t));
wgpuTextureRelease(t);
// Destroying here would cause submit to fail later, but OK to release
// because command buffer still holds a ref
...
wgpuQueueSubmit(queue, commands.size(), commands.data()); without refcounting: WGPUTexture t = wgpuDeviceCreateTexture(device, ...);
commands.push_back(makeSomeCommands(t));
// Destroying here would cause submit to fail later
...
wgpuQueueSubmit(queue, commands.size(), commands.data());
wgpuTextureDestroy(t); |
This is under the assumption that destroy destroys eagerly (which I think it is supposed to). One way to change around that would be to just have Release on everything (no Reference), and then on textures and buffers, also have Destroy which destroys eagerly. |
Removed all forms of refcounting and destroy so we can talk about it in a follow up. @kvark PTAL again. |
webgpu.h
Outdated
#endif | ||
|
||
// Procs of Buffer | ||
typedef void (*WGPUProcBufferUnmap)(WGPUBuffer buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no ability to get a proc address, these typedefs aren't useful. Let's not add them right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed (for now)
Updated to have render bundles and other WebGPU changes. Removed procs. As far as I can tell there are no more contentious changes here so I'll go ahead and merge it. |
@kvark PTAL we have to decide on a license (tentatively 3clause bsd for this repo) and the "authors" name.