Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial proposal for webgpu.h #1

Merged
merged 15 commits into from
Aug 23, 2019
Merged

Initial proposal for webgpu.h #1

merged 15 commits into from
Aug 23, 2019

Conversation

Kangz
Copy link
Collaborator

@Kangz Kangz commented Jul 15, 2019

@kvark PTAL we have to decide on a license (tentatively 3clause bsd for this repo) and the "authors" name.

@Kangz Kangz requested a review from kvark July 15, 2019 15:41
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated
#endif

// Procs of BindGroup
typedef void (*wgpuProcBindGroupReference)(wgpuBindGroup bindGroup);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a part of the API header? Looks like an implementation detail to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every WebGPU implementation will need to implement something like this to avoid freeing data in use by the GPU, and having a refcount helps composing together software that uses webgpu.h

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have those procs, and we don't free data in use by GPU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear. I expect every WebGPU implementation will have internal refcounting to avoid freeing data on the GPU (and I see wgpu-rs have ref_count even though I'm not familiar enough with Rust to understand all of what's happening there). So why not expose this to the application since it helps compose things together?

This is also slightly closer to the JS semantics with GC keeping the object alive as long as there is a reference to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't considered exposing the refcounts to the user:

  • for JS, it already knows when to call destroy()
  • for Rust, we use ownership and borrowing instead of refcounts on the API side

Perhaps, you could do it as an extension?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since WebGPU objects are internally synchronized doesn't it make sense to have them implement Clone + Sync and this would kinda require a refcount? (I don't know Rust too well so maybe this is stupid)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Device in WebGPU is truly internally synchronized.
In general, if something is internally synchronized, that means Sync is fine, but Clone has nothing to do with it. If a user wants to share the Device between multiple threads, they just wrap it in Arc<Device> and go from there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvark, then doesn't it make sense to simply expose objects in Arc<T> form directly? I.e. constructors would only initialize an Arc<Device> instead of Device.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Arc forces the way refcounting is done, which doesn't match what Dawn is doing (it's C++ and the refcount is at an offset while it is at offset 0 for Arc) nor what wgpu-native because of the HUB structure they use.

That said the same Arc semantics could be done with a Sync + Clone object that uses webgpu.h's refcounting mechanisms to implement Clone.

webgpu.h Outdated Show resolved Hide resolved
webgpu.h Outdated

wgpuProcFenceGetCompletedValue fenceGetCompletedValue;
wgpuProcFenceOnCompletion fenceOnCompletion;
wgpuProcFenceRefe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move all the refcounting stuff into a separate header, even if it's hosted alongside webgpu.h?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified them instead to not be per device but on WGPUObject which is void*

webgpu.h Outdated Show resolved Hide resolved
@kvark
Copy link
Collaborator

kvark commented Jul 18, 2019

Thank you for addressing my concerns!
The last one I got unresolved is - I think the header doesn't need all the logic related to the proc tables and the refcounting. Dawn should be just putting this into a separate header.
@grovesNL - would you have a moment to check if you spot any issues?

@kvark
Copy link
Collaborator

kvark commented Jul 18, 2019

Had a bit of discussion on the call about refcount exposure.
Pros:

  • both implementations already do that, exposing costs us nothing
  • some user would be happy to use that instead of wrapping staff in Arc and such

Cons:

  • destruction is slightly more complicated: there are still explicit destroy_() methods on textures and buffers, but actual handles only go out of scope upon refcount decrease
  • WASM shim would have to emulate refcounts (which means it would affect all the WASM clients?!).

I'm slightly concerned about the "exposing costs us nothing" aspect: it's not like we can straight expose our internal refcounts. Our implementation would likely still need to track the user refcounts separately from the internal ones (because we'd need to know when the user one reaches 0, at least), at which point we end up with just implementing an extra refcounting exposed to the user, which is nothing like "exposing costs us nothing" implies.

webgpu.h Outdated Show resolved Hide resolved
webgpu.h Show resolved Hide resolved
webgpu.h Outdated Show resolved Hide resolved
@Kangz
Copy link
Collaborator Author

Kangz commented Jul 19, 2019

I'm slightly concerned about the "exposing costs us nothing" aspect: it's not like we can straight expose our internal refcounts. Our implementation would likely still need to track the user refcounts separately from the internal ones (because we'd need to know when the user one reaches 0, at least), at which point we end up with just implementing an extra refcounting exposed to the user, which is nothing like "exposing costs us nothing" implies.

Why do you need to keep the internal refcount separate from the internal one? We were doing this initially in Dawn but merged both refcounts because with RAII wrappers the application can't mistakenly deref too much.

@kvark
Copy link
Collaborator

kvark commented Jul 19, 2019

@Kangz

Why do you need to keep the internal refcount separate from the internal one? We were doing this initially in Dawn but merged both refcounts because with RAII wrappers the application can't mistakenly deref too much.

How do you rely on RAII wrappers if it's the C API being exposed?

@Kangz
Copy link
Collaborator Author

Kangz commented Jul 19, 2019

We have a semi-idiomatic C++ wrapper for the C API that does RAII, you can find Dawn's generated version here. We'd eventually contribute a similar wrapper to this repo.

Edit: here's a link to the implementation of the header if you're interested in how it works.

@kvark
Copy link
Collaborator

kvark commented Jul 19, 2019

Sure, wrappers in high level languages can have this guarantee provided. But we are exposing the C API, we aren't requiring the users to use the wrappers. In the C API we can't guarantee that the issued refcount bumps/decs are sane.

@Kangz
Copy link
Collaborator Author

Kangz commented Jul 19, 2019

What's the goal of having separate refcounts? I don't think we should validate the refcounting: if an application does too many derefs and the internal refcount is 0, then there will be a use-after-free and a potential segfault. We can't validate derefs and guarantee 100% that the application can't segfault itself on the first invalid deref. So you might as well have a single internal+external refcount. (in Dawn I'm thinking that maybe in debug we could split them, but even then that's only useful for automated testing).

@kvark
Copy link
Collaborator

kvark commented Jul 19, 2019

It comes do to what is the "Valid Usage" of the API encoded in the header. Providing a malformed pointer in place of a struct is surely a crash, but at least it's very localized: you know exactly when and how things went wrong. The "validitity" of a refcount decrease operation isn't local: it depends on all the previous ups and downs issued. So it's possible to describe, but it's much less desirable to expose an API that is almost safe, minus this one case where you'd not be able to tell instantly what the problem is (because it's somewhere deep in the logic of your app, out of a sudden).

Of all the build targets we need to support, look at where exposing refcounts is useful:

  1. Debug native builds - not useful, since the implementations would want to track user refcounts separately.
  2. Release native builds - may be useful.
  3. Web builds - not useful, since WASM shim would have to be implementing them.

Weighting these concerns, I feel like exposing the refcounts would bring more trouble than they are worth?

@Kangz
Copy link
Collaborator Author

Kangz commented Jul 22, 2019

It comes do to what is the "Valid Usage" of the API encoded in the header.

I don't think we should strive for Rust's level of safety in a C API because it doesn't have any support for linear types that would allow encoding such constraints.

The "validitity" of a refcount decrease operation isn't local: it depends on all the previous ups and downs issued. So it's possible to describe, but it's much less desirable to expose an API that is almost safe, minus this one case where you'd not be able to tell instantly what the problem is (because it's somewhere deep in the logic of your app, out of a sudden).

Same thing when there is no refcounting happening: if you use a destroyed object, you might still have correct behavior (or a validation error instead of a crash) because the object is still internally referenced by the WebGPU implementation.

Debug native builds - not useful, since the implementations would want to track user refcounts separately.

Graphics APIs are optimized for the Release usecases so I don't think we should worry about this.

Web builds - not useful, since WASM shim would have to be implementing them.

The shim would have to implement the refcount, but it doesn't mean it's useless. The same arguments about refcounts being useful for software composability apply.

@Kangz
Copy link
Collaborator Author

Kangz commented Jul 22, 2019

Removed the procTable as was requested, and wrapped the API function declarations in a block so they can be skipped. We probably want a getProcAddress-like function in a follow-up PR.

I tried implementing the "unified" refcount today to get Dawn closer to the proposed header. Turns out unified refcount is worse than having a reference/release for each object because it essentially forces implementations to do a dynamic dispatch based on the object type when WGPUReference makes teh refcount reach 0, which might not be idiomatic C or Rust.

For the WASM shim this isn't a problem because AFAIK the reference types are all pointing into a single reference table so objects are already unified there. The cases where we would like to do some type-specific work is if we want to call GPUTexture/GPUBuffer.destroy to free GPU memory when the refcount reaches 0.

Then the question is do we want refcount reaching 0, (or explicit destroy) to act as if GPUBuffer/GPUTexture.destroy is called? I think we want to separate the two concepts for several reasons:

  • The Javascript API separates the freeing of GPU memory from the lifetime of the state-tracking object: if you lose the last reference to a texture you can still submit command buffers using it, irrespective of whether your GPUTexture has been garbage collected or not.
  • This actually means that an API with only combined destruction of the state tracking and freeing of the GPU memory cannot be used to implement WebGPU in a Web engine unless it keeps track of the GPU object graph on the content process side (a big no-no for our implementation).
  • Often-enough people will want to create WGPU objects on the fly for the commands they are recording (like a tiny UBO, ad-hoc sampler or bindgroup) and they don't want to have a wgpu_native PendingResource-like structure on their side to keep them alive until the wgpuQueueSubmit call.

So irrespective of refcount vs. no refcount, we should split the CPU object lifetime management from the equivalent of GPUBuffer/GPUTexture.destroy.

Speaking of Web engines, they will need some form of refcounting of WGPU objects as soon as they implement sharing objects between workers.

@kainino0x
Copy link
Collaborator

For the WASM shim this isn't a problem because AFAIK the reference types are all pointing into a single reference table

The table is implemented in JS at least for now, so we can implement it however we want. Emscripten's GL object tables are per-type.

@kvark
Copy link
Collaborator

kvark commented Jul 24, 2019

Thanks for the extensive responses!

I don't think we should strive for Rust's level of safety in a C API

It's not as much about safety as it is about a general "good API". When the user gets undesired behavior, a good API makes it obvious to what needs to be fixed. That's what the "locality" gives you: user searches for a place where they destroy the object and try to figure out if it's not the right place to destroy.

if you use a destroyed object, you might still have correct behavior (or a validation error instead of a crash) because the object is still internally referenced by the WebGPU implementation.

We should strive to hide the implementation details from the user, if we want the API to be portable. The way I see it, if the user calls destroy_xxx(handle) - this has a clear (implementation-independent!) semantics of the handle being lost. Any use of handle by the user is a definite validation error after that. There is no room for crashes or "may work" scenarios.

Graphics APIs are optimized for the Release usecases so I don't think we should worry about this.

The question here is not about optimization. It's about what API we expose for lifetime management.

The shim would have to implement the refcount, but it doesn't mean it's useless. The same arguments about refcounts being useful for software composability apply.

The shim should be as light as possible though. Users who don't need refcounting (or want them implemented differently, say, by introducing weak references, thread-local counters, etc) shouldn't be paying the implementation cost for them. If we implement refcounting in the WASM shim, will it be strictly better than anything the users may do on their side?

As for the reasons to separate "destroy" from refcounting semantics, this one is least clear to me:

This actually means that an API with only combined destruction of the state tracking and freeing of the GPU memory cannot be used to implement WebGPU in a Web engine unless it keeps track of the GPU object graph on the content process side (a big no-no for our implementation).

I'm confused because there appears to be some context implied here that needs to be specified, e.g. for "combined destruction of the state tracking and freeing of the GPU memory".

Let me try to bring some clarity. About "destroy" - it doesn't make sense to actually destroy anything at this stage, since the implementation may have dependent work in flight. Forcing the user to know whether there is work in flight (e.g. with fences) is an option, but that's not the direction WebGPU group has taken. So "destroy" simply means "I don't want this object any more, please free associated resources as soon as you can". This is pretty much exactly the same semantics as reaching the refcount of 0.

So at the end of the day, we are trying to figure out what of the two APIs would work better:

  1. destroy_xxx(), destroy_yyy(), destory_zzz() methods that logically invalidate the handles
  2. add_ref_xxx(), remove_ref_xxx(), add_ref_yyy(), remove_ref_yyy(), add_ref_zzz(), remove_ref_zzz() with an optional destroy_xxx() to match WebGPU API.

The (2) can't be specified to invalidate the handles at any point, since the condition of reaching 0 is run-time dependent. The (2) has more APIs, potentially more redundant ones ("destroy" versus "remove_ref"), which are making it more difficult for the users to deal with lifetime issues (and correspondingly easier to leak objects/memory).

@kainino0x
Copy link
Collaborator

kainino0x commented Jul 24, 2019

FWIW, I noticed that refcounting allows the API to more closely match WebGPU:

const t = device.createTexture(...);
commands.push(makeSomeCommands(t));
// t.destroy() here would cause submit to fail later

...

queue.submit(commands);
// no need to destroy

with refcounting:

WGPUTexture t = wgpuDeviceCreateTexture(device, ...);
commands.push_back(makeSomeCommands(t));
wgpuTextureRelease(t);
// Destroying here would cause submit to fail later, but OK to release
// because command buffer still holds a ref
...

wgpuQueueSubmit(queue, commands.size(), commands.data());

without refcounting:

WGPUTexture t = wgpuDeviceCreateTexture(device, ...);
commands.push_back(makeSomeCommands(t));
// Destroying here would cause submit to fail later

...

wgpuQueueSubmit(queue, commands.size(), commands.data());
wgpuTextureDestroy(t);

@kainino0x
Copy link
Collaborator

This is under the assumption that destroy destroys eagerly (which I think it is supposed to).

One way to change around that would be to just have Release on everything (no Reference), and then on textures and buffers, also have Destroy which destroys eagerly.

@Kangz
Copy link
Collaborator Author

Kangz commented Jul 29, 2019

Removed all forms of refcounting and destroy so we can talk about it in a follow up. @kvark PTAL again.

webgpu.h Outdated
#endif

// Procs of Buffer
typedef void (*WGPUProcBufferUnmap)(WGPUBuffer buffer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no ability to get a proc address, these typedefs aren't useful. Let's not add them right now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed (for now)

@kvark kvark self-assigned this Aug 1, 2019
@Kangz
Copy link
Collaborator Author

Kangz commented Aug 23, 2019

Updated to have render bundles and other WebGPU changes. Removed procs.

As far as I can tell there are no more contentious changes here so I'll go ahead and merge it.

@Kangz Kangz merged commit 310d84f into webgpu-native:master Aug 23, 2019
@Kangz Kangz deleted the initial branch August 23, 2019 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants