WebGL and WebGPU interops #149

wchao1115 · 2021-03-01T07:55:19Z

Create context from external sources e.g. WebGLRenderingContext and WebGPU device (getNeuralNetworkContext() and createModelBuilder() params #135)
Make power preference part of context creation options.
Constant operands can be created from either WebGL or WebGPU buffers
Model inputs and outputs can be bound with WebGL or WebGPU textures
Prefix all types with "ML". Simplify "NeuralNetworkContext" to just "MLContext" (Prefix generic interface names? #141)
Switch to use constructor for MLModelBuilder instead of factory method (Constructor or builder pattern for model building? #136)

… WebGPU device. - Make power preference part of context creation options. - Constant operands can be created from either WebGL or WebGPU buffers - Model inputs and outputs can be bound with WebGL or WebGPU textures - Prefix all types with "ML". Simplify "NeuralNetworkContext" to just "MLContext" - Switch to use constructor for MLModelBuilder instead of factory method

anssiko

Thanks @wchao1115!

Added some comments on how to make Bikeshed link to externally defined IDL that is not yet in the autolinking database.

index.bs

wchao1115 · 2021-03-03T01:09:13Z

@RafaelCintron

index.bs

huningxin · 2021-03-03T16:22:52Z

index.bs


  // Create a model that encapsulates the composition of operands by identifying the output operands.
-  Model createModel(NamedOperands outputs);
+  MLModel createModel(MLNamedOperands outputs);


Should we still keep MLModel building device agnostic and leave the device association to compile time? Say to support model.compile(glContext) and model.compile(gpuDevice). With that, developer can build the same model and compile for different devices.

There are interop use cases where the model constants (e.g. weights, etc.) are already uploaded to the GPU buffers before the model is created. In that case it will be inefficient to require the caller to make a roundtrip of that data back to the CPU memory just to construct a device-agnostic model.

As we position WebNN also as a backend API for frameworks, the notion of device-agnostic graph could be in a way of the interop efficiency since by the time the backend graph is initiated (by the frameworks), the model constants may already be mapped to the memory hosted by the frameworks, and for the frameworks already dealing with device resources, those constants could already be uploaded to the device.

It makes sense, thanks for the explanation. Given the MLModel is now device specific, would we consider merging it into the MLCompilation? Let's say replace the MLModel createModel(MLNamedOperands outputs) with Promise<MLCompilation> compile(MLNamedOperands outputs).

@wchao1115 , any thoughts about this idea?

I can see the logic to it. I think it makes sense.

@huningxin, the #1 use case is a good one. I plan to address it by adding a device preference option in the MLContextOptions. This option will be complementary to the existing power preference. As for your #2 I'm still a bit confused. I looked at the scenario that was raised in #156 (copied as follow):

c = tf.conv2d(a, b); e = tf.conv2d(c, d); h = tf.conv2d(f, g); output = await h.data();

If webnn API is going to be used this way, what prevents the caller from compiling all the 3 conv2d ops into a single compiled graph before executing it in one go? i.e. if the caller has a graph of A + B and that they want the platform-specific intermediate result after A to stay intact before it enters B during execution, then that's exactly what a compiled graph of 'A + B' is for. That's precisely why graph execution is most likely more efficient than op-by-op execution.

If the graph is A + b + C where only A and C are defined as webnn ops, but b is what the caller implements themselves, and that the caller insists that A and C must be executed separately and sequentially, then the end result will be no worse than having A and C defined elsewhere as standalone API calls outside of the webnn graph definition.

The point is that the end result of executing a publicly defined API must not be platform-specific i.e. the resulting tensor of the operation must be in a standard layout, whether that public API is implemented as a compiled webnn graph, or as a standalone API outside of webnn. Hope this helps clarify my point of view.

adding a device preference option in the MLContextOptions. This option will be complementary to the existing power preference.

I like it!

what prevents the caller from compiling all the 3 conv2d ops into a single compiled graph before executing it in one go?

That's because the framework provides the op API. When executing a graph, the user code just calls the op API to execute the operations of the graph one-by-one. The framework doesn't have the knowledge of the graph, it just executes ops. cc @pyu10055

For this scenario, the framework doesn't know the user code will execute 3 conv2d. When user code calls a conv2d, the framework just computes that conv2d. The framework may use a platform-specific tensor layout in the conv2d implementation. Until the user code calls h.data(), the framework starts to convert and copy the tensor data into a ArrayBufferView in plain layout.

So if the framework uses WebNN to compute conv2d, it would hurt the performance if WebNN always converts and copies the output tensor data to ArrayBufferView for each compute.

The point is that the end result of executing a publicly defined API must not be platform-specific

I agree with you, at the interops point, it should use the plain/standard tensor layout.

That's because the framework provides the op API. When executing a graph, the user code just calls the op API to execute the operations of the graph one-by-one.

That doesn't sound like what a framework does. Are you saying that there are frameworks that never compiles a graph internally and just relying on individual op execution?

Framework may provide both op API and model API, for example TensorFlow.js provides Operations API and Models/Loading API. It would depend on what API the user code uses. If the user code uses the op API to execute a graph, the framework won't get the graph but individual ops.

Actually, webnn-polyfill uses TensorFlow.js op API. I suppose it is a common scenario. cc @pyu10055 .

@wchao1115 @huningxin Thank you Ningxin for brought up this point, op level execution is common for eager execution, especially in JS environment, users could creating graph on the fly. There will be no specific graph boundary, framework would not be able to know which are the intermediate tensors that users would not need to access.

huningxin · 2021-03-03T16:27:12Z

index.bs

+  Promise<MLNamedOutputs> compute(MLNamedInputs inputs, 
+                                  optional MLNamedOutputs outputs = {});
+
+  Promise<MLNamedWebGLOutputs> compute(MLNamedWebGLInputs glInputs, 


Since the glInputs (same as gpuInputs) is device specific, what would happen if this MLCompilation is compiled for another device? The browser implementation would handle the data transferring or throw an error?

Yes, it will fail. Cross-device, cross-adapter support should not occur at this level. If the context is created with one device/adapter and the input is a resource from another, the API should fail. The callers of WebNN needs to ensure that the all the device-dependent resources are from the device that is bound to the context.

It makes sense. We may need to specify this behavior in the following PR.

…upport for accepting GPU and WebGLBuffer as inputs.

index.bs

…pdate examples accordingly.

index.bs

huningxin · 2021-03-10T00:53:31Z

And there is a bikeshield build error

LINK ERROR: No 'idl' refs found for 'MLModel'.
{{MLModel}}

The spec still uses {{MLModel}} in Compilation section.

wchao1115 · 2021-03-10T01:47:42Z

MLModelBuilder -> MLGraphBuilder
MLModelBuilder.compile -> MLGraphBuilder.build
MLCompilation -> MLGraph

I like that!

huningxin · 2021-03-10T04:17:26Z

The constants, inputs, and outputs can still be in ArrayBufferView in which case the implementation of webnn should still upload the resources themselves if it is running on a hardware device.

We may need to define which resource type is acceptable for a specific type of device. For example, if the device is CPU, the WebGLBuffer or WebGPUBuffer resources may not be acceptable?

typedef (WebGLBuffer or GPUBuffer or WebGLTexture or GPUTexture) GPUResource;

device type	resource type	acceptable
cpu	ArrayBuffer	✔️
cpu	GPUResource	❌ (?)
gpu	ArrayBuffer	✔️
gpu	GPUResource (on the same adapter)	✔️
gpu	GPUResource (on the different adapter)	❌
accelerator	ArrayBuffer	✔️
accelerator	GPUResource	❌ (?)

@wchao1115 , I put together above table with two opens. Please take a look and feel free to add anything I missed.

… instead of the view in the buffer union type.

index.bs

… and the GPU buffer resource view.

huningxin

LGTM. Thanks much for the great work.

wchao1115 · 2021-03-16T02:42:44Z

@anssiko @huningxin It looks like deploy.yaml doesn't run on the merged change. So the generated html file is now stale. Is there a way to manually trigger it to run?

huningxin · 2021-03-16T03:12:15Z

@anssiko @huningxin It looks like deploy.yaml doesn't run on the merged change. So the generated html file is now stale. Is there a way to manually trigger it to run?

I am not aware the manual way to trigger a github action. Hopefully PR #151 would fix it. Please take a look.

anssiko · 2021-03-16T07:25:02Z

The merge of #151 triggered a rebuild that deployed properly.

@wchao1115 @huningxin great work with this PR. This is on our agenda for this week's call (note: it is 1 hour later than usual in the US/Canada!), and that's a good opportunity to summarize the key changes and design considerations discussed in this PR.

wchao1115 requested review from huningxin and anssiko March 1, 2021 07:56

wchao1115 mentioned this pull request Mar 1, 2021

Constructor or builder pattern for model building? #136

Closed

This was linked to issues Mar 1, 2021

Constructor or builder pattern for model building? #136

Closed

Prefix generic interface names? #141

Closed

getNeuralNetworkContext() and createModelBuilder() params #135

Closed

This was referenced Mar 1, 2021

getNeuralNetworkContext() and createModelBuilder() params #135

Closed

Prefix generic interface names? #141

Closed

anssiko requested changes Mar 2, 2021

View reviewed changes

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

huningxin reviewed Mar 3, 2021

View reviewed changes

wchao1115 requested a review from pyu10055 March 4, 2021 16:17

Use Bikeshed cross-spec autolinking for WebGPU and WebGL types. Add s…

5220485

…upport for accepting GPU and WebGLBuffer as inputs.

huningxin reviewed Mar 8, 2021

View reviewed changes

index.bs Outdated Show resolved Hide resolved

huningxin reviewed Mar 8, 2021

View reviewed changes

index.bs Outdated Show resolved Hide resolved

anssiko approved these changes Mar 8, 2021

View reviewed changes

Simplify the resource constants and inputs/outputs with dictionary. U…

57f84cf

…pdate examples accordingly.

huningxin reviewed Mar 10, 2021

View reviewed changes

index.bs Outdated Show resolved Hide resolved

Remove the reference to the deleted MLModel inteface. Use ArrayBuffer…

fa1f5d8

… instead of the view in the buffer union type.

huningxin reviewed Mar 11, 2021

View reviewed changes

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

wchao1115 added 2 commits March 14, 2021 20:06

Use MLBufferView as a union type between the standard ArrayBufferView…

be764ba

… and the GPU buffer resource view.

Rename MLModelBuider to MLGraphBuilder and MLCompilation to MLGraph.

c43ed52

huningxin approved these changes Mar 15, 2021

View reviewed changes

wchao1115 merged commit c74ec24 into master Mar 15, 2021

wchao1115 deleted the wchao/interop branch March 16, 2021 00:23

huningxin mentioned this pull request Mar 17, 2021

Add the spec text and examples for MLGraph.compute #147

Merged

anssiko mentioned this pull request Mar 18, 2021

Operation-specific APIs webmachinelearning/proposals#2

Open

huningxin mentioned this pull request Mar 19, 2021

Implement the MLContext, MLGraphBuilder and MLGraph webmachinelearning/webnn-polyfill#42

Closed

huningxin mentioned this pull request Mar 30, 2021

Implement the MLContext, MLGraphBuilder and MLGraph webmachinelearning/webnn-native#7

Closed

anssiko mentioned this pull request Mar 30, 2021

Support CPU - WebAssembly scenario of the op level execution use case #156

Closed

huningxin mentioned this pull request Apr 6, 2021

Update the code according to spec change webmachinelearning/webnn-samples#35

Closed

wchao1115 mentioned this pull request Apr 13, 2021

Add support for device selection #162

Merged

zolkis mentioned this pull request Oct 31, 2022

API review, questions, brainstorming #298

Closed

zolkis mentioned this pull request Dec 1, 2022

API simplification: context owns builder, graph becomes internal slot #303

Closed

huningxin mentioned this pull request Dec 10, 2022

MlContext improvements #310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGL and WebGPU interops #149

WebGL and WebGPU interops #149

wchao1115 commented Mar 1, 2021 •

edited by pr-preview bot

Loading

anssiko left a comment

wchao1115 commented Mar 3, 2021

huningxin Mar 3, 2021

wchao1115 Mar 4, 2021

huningxin Mar 5, 2021

huningxin Mar 8, 2021

wchao1115 Mar 9, 2021

wchao1115 Mar 30, 2021 •

edited

Loading

huningxin Mar 31, 2021 •

edited

Loading

wchao1115 Mar 31, 2021

huningxin Apr 1, 2021

pyu10055 Apr 6, 2021

huningxin Mar 3, 2021 •

edited

Loading

wchao1115 Mar 7, 2021

huningxin Mar 8, 2021

huningxin commented Mar 10, 2021

wchao1115 commented Mar 10, 2021

huningxin commented Mar 10, 2021

huningxin left a comment

wchao1115 commented Mar 16, 2021

huningxin commented Mar 16, 2021

anssiko commented Mar 16, 2021

WebGL and WebGPU interops #149

WebGL and WebGPU interops #149

Conversation

wchao1115 commented Mar 1, 2021 • edited by pr-preview bot Loading

anssiko left a comment

Choose a reason for hiding this comment

wchao1115 commented Mar 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchao1115 Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

huningxin Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huningxin Mar 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huningxin commented Mar 10, 2021

wchao1115 commented Mar 10, 2021

huningxin commented Mar 10, 2021

huningxin left a comment

Choose a reason for hiding this comment

wchao1115 commented Mar 16, 2021

huningxin commented Mar 16, 2021

anssiko commented Mar 16, 2021

wchao1115 commented Mar 1, 2021 •

edited by pr-preview bot

Loading

wchao1115 Mar 30, 2021 •

edited

Loading

huningxin Mar 31, 2021 •

edited

Loading

huningxin Mar 3, 2021 •

edited

Loading