Initial proposal for webgpu.h #1

Kangz · 2019-07-15T15:41:05Z

@kvark PTAL we have to decide on a license (tentatively 3clause bsd for this repo) and the "authors" name.

webgpu.h

kvark · 2019-07-15T16:16:11Z

webgpu.h

+#endif
+
+// Procs of BindGroup
+typedef void (*wgpuProcBindGroupReference)(wgpuBindGroup bindGroup);


Does this have to be a part of the API header? Looks like an implementation detail to me.

Every WebGPU implementation will need to implement something like this to avoid freeing data in use by the GPU, and having a refcount helps composing together software that uses webgpu.h

We don't have those procs, and we don't free data in use by GPU.

Sorry I wasn't clear. I expect every WebGPU implementation will have internal refcounting to avoid freeing data on the GPU (and I see wgpu-rs have ref_count even though I'm not familiar enough with Rust to understand all of what's happening there). So why not expose this to the application since it helps compose things together?

This is also slightly closer to the JS semantics with GC keeping the object alive as long as there is a reference to it.

We haven't considered exposing the refcounts to the user:

for JS, it already knows when to call destroy()

for Rust, we use ownership and borrowing instead of refcounts on the API side

Perhaps, you could do it as an extension?

Since WebGPU objects are internally synchronized doesn't it make sense to have them implement Clone + Sync and this would kinda require a refcount? (I don't know Rust too well so maybe this is stupid)

Only Device in WebGPU is truly internally synchronized.
In general, if something is internally synchronized, that means Sync is fine, but Clone has nothing to do with it. If a user wants to share the Device between multiple threads, they just wrap it in Arc<Device> and go from there.

@kvark, then doesn't it make sense to simply expose objects in Arc<T> form directly? I.e. constructors would only initialize an Arc<Device> instead of Device.

Using Arc forces the way refcounting is done, which doesn't match what Dawn is doing (it's C++ and the refcount is at an offset while it is at offset 0 for Arc) nor what wgpu-native because of the HUB structure they use.

That said the same Arc semantics could be done with a Sync + Clone object that uses webgpu.h's refcounting mechanisms to implement Clone.

webgpu.h

kvark · 2019-07-17T17:36:24Z

webgpu.h

-
-    wgpuProcFenceGetCompletedValue fenceGetCompletedValue;
-    wgpuProcFenceOnCompletion fenceOnCompletion;
-    wgpuProcFenceRefe


can we move all the refcounting stuff into a separate header, even if it's hosted alongside webgpu.h?

Unified them instead to not be per device but on WGPUObject which is void*

webgpu.h

kvark · 2019-07-18T16:40:06Z

Thank you for addressing my concerns!
The last one I got unresolved is - I think the header doesn't need all the logic related to the proc tables and the refcounting. Dawn should be just putting this into a separate header.
@grovesNL - would you have a moment to check if you spot any issues?

kvark · 2019-07-18T17:15:19Z

Had a bit of discussion on the call about refcount exposure.
Pros:

both implementations already do that, exposing costs us nothing
some user would be happy to use that instead of wrapping staff in Arc and such

Cons:

destruction is slightly more complicated: there are still explicit destroy_() methods on textures and buffers, but actual handles only go out of scope upon refcount decrease
WASM shim would have to emulate refcounts (which means it would affect all the WASM clients?!).

I'm slightly concerned about the "exposing costs us nothing" aspect: it's not like we can straight expose our internal refcounts. Our implementation would likely still need to track the user refcounts separately from the internal ones (because we'd need to know when the user one reaches 0, at least), at which point we end up with just implementing an extra refcounting exposed to the user, which is nothing like "exposing costs us nothing" implies.

webgpu.h

Kangz · 2019-07-19T10:12:58Z

I'm slightly concerned about the "exposing costs us nothing" aspect: it's not like we can straight expose our internal refcounts. Our implementation would likely still need to track the user refcounts separately from the internal ones (because we'd need to know when the user one reaches 0, at least), at which point we end up with just implementing an extra refcounting exposed to the user, which is nothing like "exposing costs us nothing" implies.

Why do you need to keep the internal refcount separate from the internal one? We were doing this initially in Dawn but merged both refcounts because with RAII wrappers the application can't mistakenly deref too much.

kvark · 2019-07-19T14:24:51Z

@Kangz

Why do you need to keep the internal refcount separate from the internal one? We were doing this initially in Dawn but merged both refcounts because with RAII wrappers the application can't mistakenly deref too much.

How do you rely on RAII wrappers if it's the C API being exposed?

Kangz · 2019-07-19T15:08:10Z

We have a semi-idiomatic C++ wrapper for the C API that does RAII, you can find Dawn's generated version here. We'd eventually contribute a similar wrapper to this repo.

Edit: here's a link to the implementation of the header if you're interested in how it works.

kvark · 2019-07-19T16:56:42Z

Sure, wrappers in high level languages can have this guarantee provided. But we are exposing the C API, we aren't requiring the users to use the wrappers. In the C API we can't guarantee that the issued refcount bumps/decs are sane.

Kangz · 2019-07-19T17:24:01Z

What's the goal of having separate refcounts? I don't think we should validate the refcounting: if an application does too many derefs and the internal refcount is 0, then there will be a use-after-free and a potential segfault. We can't validate derefs and guarantee 100% that the application can't segfault itself on the first invalid deref. So you might as well have a single internal+external refcount. (in Dawn I'm thinking that maybe in debug we could split them, but even then that's only useful for automated testing).

kvark · 2019-07-19T17:59:09Z

It comes do to what is the "Valid Usage" of the API encoded in the header. Providing a malformed pointer in place of a struct is surely a crash, but at least it's very localized: you know exactly when and how things went wrong. The "validitity" of a refcount decrease operation isn't local: it depends on all the previous ups and downs issued. So it's possible to describe, but it's much less desirable to expose an API that is almost safe, minus this one case where you'd not be able to tell instantly what the problem is (because it's somewhere deep in the logic of your app, out of a sudden).

Of all the build targets we need to support, look at where exposing refcounts is useful:

Debug native builds - not useful, since the implementations would want to track user refcounts separately.
Release native builds - may be useful.
Web builds - not useful, since WASM shim would have to be implementing them.

Weighting these concerns, I feel like exposing the refcounts would bring more trouble than they are worth?

Kangz · 2019-07-22T13:18:23Z

It comes do to what is the "Valid Usage" of the API encoded in the header.

I don't think we should strive for Rust's level of safety in a C API because it doesn't have any support for linear types that would allow encoding such constraints.

The "validitity" of a refcount decrease operation isn't local: it depends on all the previous ups and downs issued. So it's possible to describe, but it's much less desirable to expose an API that is almost safe, minus this one case where you'd not be able to tell instantly what the problem is (because it's somewhere deep in the logic of your app, out of a sudden).

Same thing when there is no refcounting happening: if you use a destroyed object, you might still have correct behavior (or a validation error instead of a crash) because the object is still internally referenced by the WebGPU implementation.

Debug native builds - not useful, since the implementations would want to track user refcounts separately.

Graphics APIs are optimized for the Release usecases so I don't think we should worry about this.

Web builds - not useful, since WASM shim would have to be implementing them.

The shim would have to implement the refcount, but it doesn't mean it's useless. The same arguments about refcounts being useful for software composability apply.

Kangz · 2019-07-22T15:02:51Z

Removed the procTable as was requested, and wrapped the API function declarations in a block so they can be skipped. We probably want a getProcAddress-like function in a follow-up PR.

I tried implementing the "unified" refcount today to get Dawn closer to the proposed header. Turns out unified refcount is worse than having a reference/release for each object because it essentially forces implementations to do a dynamic dispatch based on the object type when WGPUReference makes teh refcount reach 0, which might not be idiomatic C or Rust.

For the WASM shim this isn't a problem because AFAIK the reference types are all pointing into a single reference table so objects are already unified there. The cases where we would like to do some type-specific work is if we want to call GPUTexture/GPUBuffer.destroy to free GPU memory when the refcount reaches 0.

Then the question is do we want refcount reaching 0, (or explicit destroy) to act as if GPUBuffer/GPUTexture.destroy is called? I think we want to separate the two concepts for several reasons:

The Javascript API separates the freeing of GPU memory from the lifetime of the state-tracking object: if you lose the last reference to a texture you can still submit command buffers using it, irrespective of whether your GPUTexture has been garbage collected or not.
This actually means that an API with only combined destruction of the state tracking and freeing of the GPU memory cannot be used to implement WebGPU in a Web engine unless it keeps track of the GPU object graph on the content process side (a big no-no for our implementation).
Often-enough people will want to create WGPU objects on the fly for the commands they are recording (like a tiny UBO, ad-hoc sampler or bindgroup) and they don't want to have a wgpu_native PendingResource-like structure on their side to keep them alive until the wgpuQueueSubmit call.

So irrespective of refcount vs. no refcount, we should split the CPU object lifetime management from the equivalent of GPUBuffer/GPUTexture.destroy.

Speaking of Web engines, they will need some form of refcounting of WGPU objects as soon as they implement sharing objects between workers.

kainino0x · 2019-07-22T23:37:21Z

For the WASM shim this isn't a problem because AFAIK the reference types are all pointing into a single reference table

The table is implemented in JS at least for now, so we can implement it however we want. Emscripten's GL object tables are per-type.

kvark · 2019-07-24T16:58:49Z

Thanks for the extensive responses!

I don't think we should strive for Rust's level of safety in a C API

It's not as much about safety as it is about a general "good API". When the user gets undesired behavior, a good API makes it obvious to what needs to be fixed. That's what the "locality" gives you: user searches for a place where they destroy the object and try to figure out if it's not the right place to destroy.

if you use a destroyed object, you might still have correct behavior (or a validation error instead of a crash) because the object is still internally referenced by the WebGPU implementation.

We should strive to hide the implementation details from the user, if we want the API to be portable. The way I see it, if the user calls destroy_xxx(handle) - this has a clear (implementation-independent!) semantics of the handle being lost. Any use of handle by the user is a definite validation error after that. There is no room for crashes or "may work" scenarios.

Graphics APIs are optimized for the Release usecases so I don't think we should worry about this.

The question here is not about optimization. It's about what API we expose for lifetime management.

The shim would have to implement the refcount, but it doesn't mean it's useless. The same arguments about refcounts being useful for software composability apply.

The shim should be as light as possible though. Users who don't need refcounting (or want them implemented differently, say, by introducing weak references, thread-local counters, etc) shouldn't be paying the implementation cost for them. If we implement refcounting in the WASM shim, will it be strictly better than anything the users may do on their side?

As for the reasons to separate "destroy" from refcounting semantics, this one is least clear to me:

This actually means that an API with only combined destruction of the state tracking and freeing of the GPU memory cannot be used to implement WebGPU in a Web engine unless it keeps track of the GPU object graph on the content process side (a big no-no for our implementation).

I'm confused because there appears to be some context implied here that needs to be specified, e.g. for "combined destruction of the state tracking and freeing of the GPU memory".

Let me try to bring some clarity. About "destroy" - it doesn't make sense to actually destroy anything at this stage, since the implementation may have dependent work in flight. Forcing the user to know whether there is work in flight (e.g. with fences) is an option, but that's not the direction WebGPU group has taken. So "destroy" simply means "I don't want this object any more, please free associated resources as soon as you can". This is pretty much exactly the same semantics as reaching the refcount of 0.

So at the end of the day, we are trying to figure out what of the two APIs would work better:

destroy_xxx(), destroy_yyy(), destory_zzz() methods that logically invalidate the handles
add_ref_xxx(), remove_ref_xxx(), add_ref_yyy(), remove_ref_yyy(), add_ref_zzz(), remove_ref_zzz() with an optional destroy_xxx() to match WebGPU API.

The (2) can't be specified to invalidate the handles at any point, since the condition of reaching 0 is run-time dependent. The (2) has more APIs, potentially more redundant ones ("destroy" versus "remove_ref"), which are making it more difficult for the users to deal with lifetime issues (and correspondingly easier to leak objects/memory).

kainino0x · 2019-07-24T17:36:44Z

FWIW, I noticed that refcounting allows the API to more closely match WebGPU:

const t = device.createTexture(...);
commands.push(makeSomeCommands(t));
// t.destroy() here would cause submit to fail later

...

queue.submit(commands);
// no need to destroy

with refcounting:

WGPUTexture t = wgpuDeviceCreateTexture(device, ...);
commands.push_back(makeSomeCommands(t));
wgpuTextureRelease(t);
// Destroying here would cause submit to fail later, but OK to release
// because command buffer still holds a ref
...

wgpuQueueSubmit(queue, commands.size(), commands.data());

without refcounting:

WGPUTexture t = wgpuDeviceCreateTexture(device, ...);
commands.push_back(makeSomeCommands(t));
// Destroying here would cause submit to fail later

...

wgpuQueueSubmit(queue, commands.size(), commands.data());
wgpuTextureDestroy(t);

kainino0x · 2019-07-24T17:39:01Z

This is under the assumption that destroy destroys eagerly (which I think it is supposed to).

One way to change around that would be to just have Release on everything (no Reference), and then on textures and buffers, also have Destroy which destroys eagerly.

Kangz · 2019-07-29T17:48:07Z

Removed all forms of refcounting and destroy so we can talk about it in a follow up. @kvark PTAL again.

kvark · 2019-07-29T21:35:30Z

webgpu.h

+#endif
+
+// Procs of Buffer
+typedef void (*WGPUProcBufferUnmap)(WGPUBuffer buffer);


Since there is no ability to get a proc address, these typedefs aren't useful. Let's not add them right now?

Removed (for now)

Kangz · 2019-08-23T13:08:55Z

Updated to have render bundles and other WebGPU changes. Removed procs.

As far as I can tell there are no more contentious changes here so I'll go ahead and merge it.

Initial proposal for webgpu.h

89c3870

Kangz requested a review from kvark July 15, 2019 15:41

kvark requested changes Jul 15, 2019

View reviewed changes

void const * nextInChain

9b9b1f7

kvark reviewed Jul 17, 2019

View reviewed changes

Kangz added 5 commits July 18, 2019 15:35

Casing changes

23521aa

Remove structureType for initial structures

b669c92

Flag changes.

b955adc

Unify refcounting

87c3615

Don't have fragmentStage optional

c8e10f5

Kangz force-pushed the initial branch from 2a31739 to c8e10f5 Compare July 18, 2019 16:10

grovesNL reviewed Jul 18, 2019

View reviewed changes

webgpu.h Outdated Show resolved Hide resolved

webgpu.h Show resolved Hide resolved

webgpu.h Outdated Show resolved Hide resolved

Remove unneeded point for computeStage

5cd64c5

Removed the proc table

113792f

Kangz added 2 commits July 29, 2019 19:43

Remove all refcounting

65c50e3

Remove refcounting

2053c4c

Remove destroy

2542ba4

kvark requested changes Jul 29, 2019

View reviewed changes

kvark self-assigned this Aug 1, 2019

Kangz added 3 commits August 23, 2019 14:58

Add bundles, textureComponentType, change some texture formats.

99b34af

extern C all function pointers

c5a95a3

Remove procs.

089b16f

Kangz merged commit 310d84f into webgpu-native:master Aug 23, 2019

Kangz deleted the initial branch August 23, 2019 13:09

kainino0x mentioned this pull request Jun 3, 2022

Flag enum types and flags typedefs #159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial proposal for webgpu.h #1

Initial proposal for webgpu.h #1

Kangz commented Jul 15, 2019

kvark Jul 15, 2019

Kangz Jul 15, 2019

kvark Jul 15, 2019

Kangz Jul 16, 2019

kvark Jul 16, 2019

Kangz Jul 16, 2019

kvark Jul 16, 2019

dragostis Oct 21, 2019

Kangz Oct 21, 2019

kvark Jul 17, 2019

Kangz Jul 18, 2019

kvark commented Jul 18, 2019

kvark commented Jul 18, 2019

Kangz commented Jul 19, 2019

kvark commented Jul 19, 2019

Kangz commented Jul 19, 2019 •

edited

Loading

kvark commented Jul 19, 2019

Kangz commented Jul 19, 2019

kvark commented Jul 19, 2019

Kangz commented Jul 22, 2019

Kangz commented Jul 22, 2019

kainino0x commented Jul 22, 2019

kvark commented Jul 24, 2019

kainino0x commented Jul 24, 2019 •

edited

Loading

kainino0x commented Jul 24, 2019

Kangz commented Jul 29, 2019

kvark Jul 29, 2019

Kangz Aug 23, 2019

Kangz commented Aug 23, 2019

Initial proposal for webgpu.h #1

Initial proposal for webgpu.h #1

Conversation

Kangz commented Jul 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvark commented Jul 18, 2019

kvark commented Jul 18, 2019

Kangz commented Jul 19, 2019

kvark commented Jul 19, 2019

Kangz commented Jul 19, 2019 • edited Loading

kvark commented Jul 19, 2019

Kangz commented Jul 19, 2019

kvark commented Jul 19, 2019

Kangz commented Jul 22, 2019

Kangz commented Jul 22, 2019

kainino0x commented Jul 22, 2019

kvark commented Jul 24, 2019

kainino0x commented Jul 24, 2019 • edited Loading

kainino0x commented Jul 24, 2019

Kangz commented Jul 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kangz commented Aug 23, 2019

Kangz commented Jul 19, 2019 •

edited

Loading

kainino0x commented Jul 24, 2019 •

edited

Loading