Simplify MLContext creation #322

wchao1115 · 2022-12-26T05:38:38Z

Given the explicit support for the WebGPU gpuDevice as a dedicated ML context type in addition to the default type, and the recent removal of the support for the WebGLContext device type, the MLContext creation can be further simplified by removing the internal GPU device type that is not explicitly specified as a gpuDevice parameter to the ML context creation method i.e. a GPU context can now be created only from a WebGPU device.

The second change is the removal of the high-performance power preference. This option was originally defined to facilitate the creation of a GPU context that is focused on high performance, low latency execution of a discrete GPU (presumably as a tiebreaker in a configuration where both an integrated GPU and a dedicated discrete GPU are present). Following the rationale behind the first change to only support a GPU context thru a WebGPU device, this option is no longer necessary since the specified gpuDevice must be created from an adapter that must first be explicitly selected by the user of WebGPU i.e. it is now up to the user to choose which GPU adapter (e.g. integrated vs. discrete) that they want to use as a GPU device backing the MLContext object.

Lastly, please note that with this change, the default context is now an implementation choice. As long as the implementation takes reasonable consideration to the user-specified power preference option, it can implement the default context type any way they want to as long as they aren't a GPU implementation. For instance, the implementation can be entirely CPU-based or as a hybrid CPU-NPU implementation, as long as it fully supports ArrayBufferView as in the API contract, it's all good from the spec point of view. I've taken to remove explicit mentions of CPU as a normative requirement in the relevant wordings to make it sound more like an informative consideration.

Preview | Diff

huningxin · 2022-12-28T07:17:20Z

i.e. a GPU context can now be created only from a WebGPU device.

I concern that this change would prevent an non-browser implementation, such as WebNN-native Node.js binding, from supporting GPU device.

wchao1115 · 2022-12-28T16:19:48Z

Thanks for the comment @huningxin. Non-browser implementation only supports default context by definition, and therefore it can implement the interface as it sees fit (as implied in the last paragraph in my PR description), including as a GPU implementation. I can expand on that wording a bit more if that helps. Removing GPU support from non-WebGPU device could really simplify the browser implementation, which I believe is the priority in our charter, and makes it clear how WebNN interop with WebGPU.

huningxin · 2022-12-29T09:13:57Z

Thanks for the explanation, @wchao1115 .

The non-browser implementation is one aspect. Actually, I am trying to understand the impact to the default context developers.

Before this change, to utilize a GPU device, the developers can just set the device type to "gpu" when creating the default context. The "gpu" and "cpu" default contexts can share the same code for graph building and compute (e.g., MLContext.compute()).

However, after this change, to utilize a GPU device, the developers would need to use WebGPU API to obtain a GPUDevice and create a webgpu context with it. Although the graph building code may be reused with the default context, the graph compute code would be different, because the compute() method is not allowed in a webgpu context (the spec says this method throws "OperationError" DOMException if not a default context). In particular, developers need to create MLCommandEncoder and use its methods including initializeGraph(), dispatch() and finish() to compute a graph.

I feel this change would bring some extra overhead (WebGPU interop) to the default context developers for GPU access and reduce the code reusability across different types of devices for some use cases.

wchao1115 · 2022-12-29T16:53:35Z

With this change the only requirement of the default context is that it needs to support the array buffer view as constants, inputs, and outputs. As long as CPU resources are supported, the implementation can be either on the CPU or GPU side (i.e. with automatic upload/download for the GPU execution)

That said I'm not familiar with the importance of supporting non-browser implementation out of the same standard web API spec. Is that a requirement for WebNN? Do we have existing customers that require that webnn-native be supported as one of the publicly maintained WebNN implementations? My view is that we can still maintain webnn-native that supports both CPU and GPU with an extension e.g. retaining the device preference enum in the webnn-native API but outside of the WebNN spec, etc.

As for the OperationError failure cases for compute I think it's overkill. I don't recall why we need that. We should remove those restrictions.

huningxin · 2022-12-30T01:10:54Z

With this change the only requirement of the default context is that it needs to support the array buffer view as constants, inputs, and outputs. As long as CPU resources are supported, the implementation can be either on the CPU or GPU side (i.e. with automatic upload/download for the GPU execution)

It's great that the default context implementation can still support GPU. However, with the removal of the device type and high-performance power preference, the default context developers lose the capability to request a GPU implementation.

That said I'm not familiar with the importance of supporting non-browser implementation out of the same standard web API spec. Is that a requirement for WebNN?

It is tracked by #142. I am not aware any existing customers of webnn-native node.js binding. I agree the non-browser implementation is complementary. My focus is the default context usage in web browsers.

As for the OperationError failure cases for compute I think it's overkill. I don't recall why we need that. We should remove those restrictions.

This was introduced by #257.

I think this makes sense because it makes a clear separation by default and webgpu contexts for the two major usages that are WebNN-standalone (default) and WebNN/WebGPU interop.

As I summarized in above PR review , the difference can be captured by the following table (revised by dropping the WebGL interop).

Context Type	Creation Method	Device Type	Graph Execution Method	Resource Type
default	Create from an `MLContextOptions`	CPU/GPU/NPU, device selected by device and power preferences	`compute()` (async call for both main thread and worker) `computeSync()` (sync call for worker only)	CPU resources: `ArrayBufferView`
webgpu	Create from a `GPUDevice`	GPU, device selected by WebGPU API	`MLCommandEncoder` records an `MLGraph` into a `GPUCommandBuffer` that can be submitted to a `GPUQueue`	GPU resources: `GPUBuffer` and `GPUTexture`

wchao1115 · 2022-12-30T05:41:30Z

As I proposed in my earlier comment, are you open to leaving the device preference only for webnn-native and not officially on the WebNN spec? That way the webnn-native continues to function as is today while we simplify the contract for WebNN to benefit the browser implementation?

huningxin · 2022-12-30T07:22:06Z

Thanks @wchao1115 , webnn-native should follow the spec as much as possible. It would be fine.

My concern is more about the impact to the default context developers, they are also targeting web browsers. For example, the TFLite WebNN delegate uses the default context. It allows developers to access GPU device by just setting device type and without any other code modifications. With this change, it has to create a webgpu context and introduce a different graph compute path with WebGPU interop.

wchao1115 · 2022-12-30T19:07:16Z

Thanks. I understand your concern from the framework developers' point of view. However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device. Having multiple GPU devices to deal with creates huge complication around performance, and a lot of complications around resource sharing especially if the system has multiple adapters configured (which is a lot more common than most people think, with both Intel integrated and other discrete adapter plugged in).

It seems pretty clear that people wants WebGPU to be the way webapp deals with GPU going forward. We even dropped support for WebGL to make it more compelling to frameworks and apps to target WebGPU in the future. With that backdrop, having framework developers targeting WebGPU explicitly in the backend may create some short-term integration work but it's a small price to pay because you end up with a much more streamline GPU stack, with a more robust interop, and more efficient support for the apps down the road, which is a huge advantage that multiplies over time.

The current design will make efficient and robust WebGPU interop very hard to accomplish in practice because we intentionally fragment it at the API level.

huningxin · 2023-01-03T02:41:24Z

@wchao1115

However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device.

I suppose this issue could be solved by the implementation that manages and shares the GPU device between the WebNN default context and WebGPU.

This was prototyped by my previous WebNN-on-Dawn Chromium PoC (proof-of-concept) that integrates the WebNN implementation into WebGPU Dawn project. It allows a WebNN context creates and owns a WebGPU/Dawn device internally. The WebGPU/Dawn device is used by the WebNN implementation to execute the command lists that represents the MLGraph.

Another proof is WebNN-polyfill. The default context polyfill implementation is able to run on top of WebGPU via TensorFlow.js WebGPU backend.

It seems pretty clear that people wants WebGPU to be the way webapp deals with GPU going forward.

I agreed. And that's exactly why I think WebGPU interop is important especially for the GPU rendering / real-time video processing use cases.

On the other hand, I still believe it is very compelling if the default context execution model, in particular MLContext.compute() with ArrayBufferViews resources, could easily target different hardware devices, including CPU, GPU and NPU, by simply setting a device type. I suppose this is widely adopted execution model by native frameworks / OS ML APIs as well.

index.bs

huningxin · 2023-01-03T03:19:52Z

@wchao1115

However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device.

This issue may not exist for some native ML APIs that run in a separated process of WebGPU, such as the ChromeOS MLService. The CG's Model Loader API implementation is based on MLService. The device type of default context is used for MLService to select a device. In particular, when creating a model loader, the implementation will get the device type from the context, pass the device type to MLService and the MLService will select a device used to do model inference. This infrastructure is also shared by the ongoing WebNN MLService backend prototype.

/cc @wacky6

wacky6 · 2023-01-03T04:19:30Z

However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device.

Is the concern about having multiple JS objects for a single GPU device, and potentially allocating 2x memory in GPU?.

Something like:

Create default context (implementation chooses GPU). Developer wants to use this as fallback when there's GPU resource contention.
Create GPU context (explicitly choose GPU)
Model is duplicated in GPU memory

I don't see a particular reason to discourage this from API design perspective.

Note, ChromeOS MLService manages GPU/NPU resource independently of WebGPU process.

In an ideal world, developer don't need to care about device type, MLService will just pick one intelligently.

Some theoretical examples:

OS decide website engagement, system load, available resource
OS evicts (e.g. GPU -> CPU), throttles inference when system is underload
Decide based on model ops (e.g. TPU for matrix heavy models, GPU for convolution based models)

wchao1115 · 2023-01-03T05:25:13Z

I suppose this issue could be solved by the implementation that manages and shares the GPU device between the WebNN default context and WebGPU.

@huningxin I can see how this can be done on a system with a single GPU adapter. But a WebGPU gpuDevice is created from an adapter that is explicitly selected by the developer. So, in a multi-adapter configuration, the WebNN implementation may end up create a device different from what the WebGPU user wants.

Is the concern about having multiple JS objects for a single GPU device, and potentially allocating 2x memory in GPU?.

@wacky6 The concern is two-folds. First, the uncoordinated implementations between WebGPU and WebNN could end up creating duplicates of the same GPU device on the same adapter. This isn't just two JS objects pointing to the same underlying GPU system device objects (e.g. a D3D device in the Windows case) but two GPU device objects, each wrapped within its own JS API object. This would be quite wasteful from the OS point of view.

Secondly, if we assume that the implementations of the two APIs are tightly coordinated (which shall we say is currently still questionable, but lets assume that for the sake of this conversation), we still have a multi-adapter case to worry about. These are systems with two physical GPU adapters (e.g. an Intel integrated graphics with a discrete GPU adapter plugged in to one of the IO slots). In this settings, a WebGPU user would be responsible for selecting an adapter they want to use as the basis for the gpuDevice they will create (i.e. there is no default device in WebGPU, you create a device from a pre-selected adapter). There is no way for the WebNN implementation to know ahead of time which adapter the user would want their gpuDevice to be base on. And, if the default context simply picks the default adapter (almost always an integrated one) while the user picks a discrete adapter, you end up having two actual GPU devices with different resource domains lying around while their resources can't actually be shared across. So, it's not only wasteful but also not workable i.e. breaking apps.

huningxin · 2023-01-03T05:51:46Z

@wchao1115

There is no way for the WebNN implementation to know ahead of time which adapter the user would want their gpuDevice to be base on.

That's true.

And, if the default context simply picks the default adapter (almost always an integrated one) while the user picks a discrete adapter, you end up having two actual GPU devices with different resource domains lying around while their resources can't actually be shared across.

This case seems not an issue because the default context compute() method would only take ArrayBufferView resources and won't need to interact with any WebGPU resources created on discrete GPU adapter.

And if developers can set the "high-performance" power preference, the WebNN default context may select the same discrete GPU adapter that is selected by WebGPU call explicitly before.

huningxin · 2023-01-03T05:55:46Z

@wacky6

In an ideal world, developer don't need to care about device type, MLService will just pick one intelligently.

Do you mean the "auto" device preference proposal of Model Loader API?

wacky6 · 2023-01-03T06:20:07Z

Do you mean the "auto" device preference proposal of Model Loader API?

Yes. And for multi-GPU, using ["gpu"] picks the best available one (e.g. prefers discrete over integrated).

wchao1115 · 2023-01-03T08:03:44Z

@wacky6

Yes. And for multi-GPU, using ["gpu"] picks the best available one (e.g. prefers discrete over integrated).

The definition of "best" varies depending on what the app wants. Discrete GPU consumes a lot of power. It may not be ideal to run a model with low compute requirements where using an integrated GPU could preserve battery life in a mobile situation.

wchao1115 · 2023-01-03T08:17:21Z

@huningxin

This case seems not an issue because the default context compute() method would only take ArrayBufferView resources and won't need to interact with any WebGPU resources created on discrete GPU adapter.

My point was that in this situation sharing a piece of content between the two API would then become a pair of copies for data download/upload, which is highly inefficient.

By making a WebGPU device the only GPU device WebNN operates on, we make the API efficient by default. zero-copy by default.

The reasoning behind this proposal is to simplify and make things more efficient by default. It isn't because something is functionally broken. My chief concern is that the current API provides too many options that could easily lead to inefficient implementation and interop with WebGPU. It seems far cleaner to coalesce around WebGPU device for GPU support of WebNN provided that efficient interop with WebGPU is highly desirable. I actually believe this proposal will help bridge the implementation gap on the GPU side with the Chromium GPU folks.

As a concrete sample data point, one strong hint of a redundant design is that there are both the MLPowerPreference defined on WebNN and the GPUPowerPreference on WebGPU. The two enums are basically doing the exact same thing for GPU adapter selection.

huningxin · 2023-01-04T08:26:59Z

@wchao1115

My point was that in this situation sharing a piece of content between the two API would then become a pair of copies for data download/upload, which is highly inefficient.

That's true. This is a WebNN/WebGPU interop scenario. Developers should create a webgpu context and use MLCommandEncoder for WebGPU interop.

I feel WebNN spec should allow the implementation to pick a GPU device if developers just want to use GPU but don't care about WebGPU interop or multi-adapters etc.,.

As a concrete sample data point, one strong hint of a redundant design is that there are both the MLPowerPreference defined on WebNN and the GPUPowerPreference on WebGPU.

Modern CPUs may also employ hybrid cores, such as performance-cores and efficient-cores or big.LITTLE cores, to balance power and performance. The MLPowerPreference may also help select the cores to run the workloads on these CPU architectures.

wchao1115 · 2023-01-07T18:18:22Z

I feel WebNN spec should allow the implementation to pick a GPU device if developers just want to use GPU but don't care about WebGPU interop or multi-adapters etc.,.

In practice, the GPU implementation will pick the GPU adapter for the device based on the MLPowerPreference. The same logic is used when a WebGPU picks an adapter based on GPUPowerPreference. By letting the app developer selects the adapter themselves through WebGPU, we reduce the redundancy and remove the ambiguity of having two adapters both active but known only by one of the two API without efficient interoperability.

Even though it may seem that we limit WebNN from being able to manage its own GPU device, in practice no functional capability is being removed here. The developer can still achieve the same outcome as what they can do now but doing so in a more direct way. WebNN not managing a GPU device lifetime is a good thing from an implementation standpoint.

anssiko

@wchao1115 thanks! I took a look and made some minor suggestions. I think this PR benefits from a WG discussion so I plan to bring this to our next call agenda to solicit more feedback and review.

index.bs

wchao1115 · 2023-01-10T04:39:45Z

@wchao1115 thanks! I took a look and made some minor suggestions. I think this PR benefits from a WG discussion so I plan to bring this to our next call agenda to solicit more feedback and review.

Thanks. How should we handle existing occasional typos and misspelled words that are irrelevant to the change? It just makes the merge harder. Ideally, I think we should have a round of "scrub only" changes that does not materially impact the rest of the doc content.

anssiko · 2023-01-10T05:47:50Z

It is fine to fix typos if those typos were introduced by the changes in this PR. For other changes, a separate PR keeps us focused.

anssiko · 2023-01-10T07:21:55Z

We should check whether these proposed changes suggest changes to the performance adaptation use case. It is generally recommended to maintain use cases alongside the API as it evolves. In the spirit of atomic PRs any use case changes motivated by API changes should be incorporated into the same PR. If that’d complicate the concrete work, the editors are free to work on these changes in separate PRs and cross-link the two.

Related, there has been a proposal #207 to update the performance adaptation use case in the past. The WG considered that update future work given no accompanying API changes were suggested at that time.

wchao1115 · 2023-01-10T19:02:45Z

@anssiko This change doesn't affect that use case as it doesn't remove any current capability i.e. the developer can still use WebNN to direct the execution of her model to the CPU via the default context.

anssiko · 2023-01-12T08:39:59Z

To ease the review: the proposed IDL changes in this PR are as follows (excluding MLOperator -> MLActivation rename changes already landed in #321):

Navigator includes NavigatorML;
 WorkerNavigator includes NavigatorML;
 
-enum MLDeviceType {
-  "cpu",
-  "gpu"
-};
-
 enum MLPowerPreference {
   "default",
-  "high-performance",
   "low-power"
 };
 
 dictionary MLContextOptions {
-  MLDeviceType deviceType = "cpu";
   MLPowerPreference powerPreference = "default";
 };
 
 [SecureContext, Exposed=(Window, DedicatedWorker)]
 interface ML {
+  // Default context
   Promise<MLContext> createContext(optional MLContextOptions options = {});
-  Promise<MLContext> createContext(GPUDevice gpuDevice);
 
   [Exposed=(DedicatedWorker)]
   MLContext createContextSync(optional MLContextOptions options = {});
+
+  // GPU context
+  Promise<MLContext> createContext(GPUDevice gpuDevice);
+
   [Exposed=(DedicatedWorker)]
   MLContext createContextSync(GPUDevice gpuDevice);
 };

…be used with a WebGPU-device-based GPU context.

wchao1115 · 2023-01-13T23:12:13Z

Rebase on main and remove restrictive text that prevents compute and computeSync to work with a WebGPU context per WG feedback on 1/13. @huningxin please take a look.

huningxin · 2023-01-16T13:44:10Z

@wchao1115

Rebase on main and remove restrictive text that prevents compute and computeSync to work with a WebGPU context per WG feedback on 1/13. @huningxin please take a look.

Thanks for the update. It resolved the corresponding conversation.

anssiko

My comments have been addressed, thanks!

We should wait for @huningxin to be back from his well-deserved holiday to confirm his feedback and comments has been addressed before we proceed with this PR.

@pyu10055, given the impact of this proposed change to JS frameworks, feedback from TF.js would be invaluable. Please see my compact diff at #322 (comment) if this PR looks tl;dr. All comments and questions welcome.

(As a reminder, this was discussed on our last call.)

anssiko · 2023-02-02T17:24:30Z

We discussed this on our call https://www.w3.org/2023/02/02-webmachinelearning-minutes.html today.

Here's a summary and proposed next steps:

This feature introduces a normative dependency on the WebGPU API for the GPU path. We discussed how maturization of this important feature would greatly benefit from tight coordination with WebGPU contributors, implementation experience and feedback from key customers such as JS ML framework. We also discussed pros and cons and logistics of landing this feature in a revised CR vs. in the initial CR. The upcoming WebGPU F2F 2023-02-16/17 in the SF bay area would be a great opportunity to socialise and seek feedback on the WebNN-WebGPU interoperability design overall.

@wchao1115 would you be interested in attending that F2F (virtually) and sharing the WebNN-WebGPU interop design at the meeting? @huningxin @RafaelCintron may want to contribute too.

@pyu10055, given the impact of this proposed change to JS ML frameworks, feedback from TF.js would be invaluable. Please see my compact diff at #322 (comment) if this PR looks tl;dr. All comments and questions welcome.

huningxin · 2023-02-03T03:29:06Z

@anssiko

We discussed this on our call https://www.w3.org/2023/02/02-webmachinelearning-minutes.html today.

I think we also would like to hear feedback from cross-platform support perspective.

@wacky6 , do you see any gaps to support this change (compact diff at #322 (comment)) on WebNN MLService implementation. The current prototype passes the device enum type , say gpu, to MLService to select GPU. After this change, it may pass the GPUDevice object to MLService. I am not sure whether it can be handled.

And this change may also impact Model Loader API because it is sharing the MLContext interface with WebNN. Any thoughts from Model Loader's point of view?

anssiko · 2023-02-07T08:18:21Z

@mattsoulanille from the TensorFlow.js team may be able to provide feedback from the ML JS framework perspective on this proposed change.

pyu10055 · 2023-02-07T18:28:17Z

Sorry for being late to this discussion, I like @wchao1115's proposal that enables WebNN expose low level implementation details (WebGPU) to the API to enable high level framework to provide more efficient integration, and better ML pipeline. At the same time, I do agree with @huningxin that less experienced users who would prefer a semantically meaningful API, I think this would improve the usability of the API and attract more adoptions.

I would selfishly prefer having both APIs to address needs of users from different spectrum.

wchao1115 · 2023-02-07T22:29:42Z

@anssiko I have no strong opinion on whether this change should be incorporated as part of the upcoming CR or later. I think that's entirely logistical. We should do what enables us to expedite the CR review process, which I'm not an expert of.

As for the change itself, this line of summary worries me a bit:

This feature introduces a normative dependency on the WebGPU API for the GPU path.

I'd like to clarify that the spec already takes this normative dependency on WebGPU today without this change. If anything, this change should simplify the implementation of this dependency in practice, as it consolidates GPU support around just one code path with a WebGPU device.

@pyu10055 I do not consider the WebGPU dependency a low-level implementation detail. Again, this dependency already exists at the API level before this change i.e. WebNN already allows the caller to instantiate an MLContext from a specified GPUDevice.

Given that this change is about removing the support for an internal GPU device option (i.e. the MLDeviceType == "gpu" use case), one of the core arguments against it is that it'll make it harder for the caller of WebNN (e.g. TF.js) to support GPU execution (as it removes an "easier" way of doing it). This argument is based on a wrong assumption that supporting the GPU would require nothing other than an enum value or a Boolean option from the caller. Consider the case where the configuration of the client machine includes multiple GPU adapters for example, which adapter should be used for WebNN execution? Does the framework like TF.js have enough information to decide on behalf of its caller? What if that decision is either incorrect or must be changed in the future?

How much should we care about this multi-adapter systems? i.e. how many client devices out there with multiple adapters in it? It turns out that roughly 1/3 of all the PC out there are equipped with a discrete GPU adapter, and every one of those machines also comes with an Intel integrated graphics.

Allowing the caller to specify precisely which GPUDevice (from which GPU adapter exactly) they would want to run inference on mitigates this dangerous ambiguity that could really lead to a very different outcome for the users. One data point today is with ONNX Runtime i.e. ORT does not assume which GPU adapter the caller wants to run on, it relies on the caller to choose explicitly. I think every ML framework that supports the GPU would very likely need to make a similar design choice.

pyu10055 · 2023-02-07T23:15:10Z

@wchao1115 I might have not explained myself clearly, I think it is a good thing to framework like TFJS to be able to specify or pass down the gpudevice to WebNN. The question is whether a default device can/should be specified through semantic API (high-performance/low power/etc). I think this is useful, most likely, the end users or framework might not have the understanding or low level hardware access to make a better decision.

wchao1115 · 2023-02-08T02:15:49Z

@pyu10055 If what you mean is through the MLPowerPreference options, please also take a look at GPUPowerPreference. As you can see, they are essentially identical. To select a GPU adapter, the app only needs to make 1 call to GPU.RequestAdapter with a specified GPUPowerPreference "hint" value. It isn't hard to do, can be done easily by the app.

But if you think you would rather surface a similar adapter hint to the app of TF.js, that call to requestAdapter can also be made by TF.js on behalf of the app who passes in the adapter hint. In either case, it can be done above WebNN.

WebNN is intentionally designed to be a backend API, and not really as a framework API. Having it directly accepting the device corresponding to the desired adapter removes ambiguity and the burden of maintaining compatibility in the future.

pyu10055 · 2023-02-08T18:35:55Z

@wchao1115 Using WebGPU device makes sense if the API allows frameworks or users to specifically create a WebGPU backend. But if the intention of WebNN GPU backend is transparent GPU acceleration, I think WebNN should have an level of abstraction on GPU, exposing WebGPU objects as part of WebNN API will add burdens to non-WebGPU implementations.

wchao1115 · 2023-02-08T22:26:51Z

I think WebNN should have an level of abstraction on GPU, exposing WebGPU objects as part of WebNN API will add burdens to non-WebGPU implementations.

This is already the case today even with the default GPU option i.e. if you intend to run a model on the GPU with an input bound to a GPU resource, that GPU resource must already be a WebGPU resource. Moreover, it must be a WebGPU resource created from the same WebGPU device that WebNN internally manages. This is the reason why WebNN can't really "hide" the underlying WebGPU device from the caller. It's much more natural and robust to have the WebNN caller specifies both what device it wants to run on, and what resource from the same device that it wants to bind the model's input to.

anssiko · 2023-02-09T08:37:39Z

I have no strong opinion on whether this change should be incorporated as part of the upcoming CR or later. I think that's entirely logistical. We should do what enables us to expedite the CR review process, which I'm not an expert of.

@wchao1115 here's the proposed "status of this document" text that will enable us to expedite the CR process and solicit wider feedback and review on this important design consideration: #340

wchao1115 · 2023-02-09T18:42:57Z

@anssiko I've commented on #340. I don't think this PR is related to WebGPU interop. As has been explained in this thread, this PR is about removing the internal GPU option in MLContext. WebGPU interop requirement already exists prior to this proposed change.

wchao1115 · 2024-01-27T04:44:56Z

Retract until consensus can be found.

wchao1115 requested review from anssiko, huningxin and RafaelCintron December 26, 2022 05:38

huningxin reviewed Jan 3, 2023

View reviewed changes

index.bs Outdated Show resolved Hide resolved

anssiko requested changes Jan 9, 2023

View reviewed changes

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

huningxin mentioned this pull request Jan 12, 2023

Transfer the input and output views for asynchronous execution #323

Merged

wchao1115 added 3 commits January 13, 2023 15:05

Fix some typos.

3208289

Tighten up some wordings so it reads better.

02ce4b5

Remove restrictive text that prevents compute and computeSync to …

fde3783

…be used with a WebGPU-device-based GPU context.

wchao1115 force-pushed the wchao_mlcontext branch from f95b30d to fde3783 Compare January 13, 2023 23:10

RafaelCintron approved these changes Jan 14, 2023

View reviewed changes

anssiko approved these changes Jan 23, 2023

View reviewed changes

anssiko mentioned this pull request Feb 9, 2023

Add Status of this document note for Candidate Recommendation #340

Merged

zolkis mentioned this pull request May 19, 2023

Subclass MLGraph based on the context that creates it #344

Closed

anssiko mentioned this pull request Jun 19, 2023

API simplification: context owns builder, graph becomes internal slot #303

Closed

a-sully mentioned this pull request Jan 12, 2024

Support for device-based tensor storage objects #482

Closed

wchao1115 closed this Jan 27, 2024

zolkis mentioned this pull request Feb 9, 2024

Fix dimensions for 0D scalars #476

Merged

a-sully mentioned this pull request Feb 21, 2024

Add MLBuffer exploration doc #541

Closed

inexorabletash mentioned this pull request May 30, 2024

WebNN should support NPU and QDQ operations #623

Open

mwyrzykowski mentioned this pull request Aug 8, 2024

MLContextOptions.deviceType seems unnecessary outside of conformance testing #749

Open

zolkis added the device selection label Nov 13, 2024

Simplify MLContext creation #322

Simplify MLContext creation #322

Conversation

wchao1115 commented Dec 26, 2022 • edited by pr-preview bot Loading

huningxin commented Dec 28, 2022

wchao1115 commented Dec 28, 2022 • edited Loading

huningxin commented Dec 29, 2022

wchao1115 commented Dec 29, 2022 • edited Loading

huningxin commented Dec 30, 2022

wchao1115 commented Dec 30, 2022

huningxin commented Dec 30, 2022

wchao1115 commented Dec 30, 2022

huningxin commented Jan 3, 2023

huningxin commented Jan 3, 2023

wacky6 commented Jan 3, 2023

wchao1115 commented Jan 3, 2023 • edited Loading

huningxin commented Jan 3, 2023

huningxin commented Jan 3, 2023

wacky6 commented Jan 3, 2023

wchao1115 commented Jan 3, 2023

wchao1115 commented Jan 3, 2023

huningxin commented Jan 4, 2023

wchao1115 commented Jan 7, 2023

anssiko left a comment

Choose a reason for hiding this comment

wchao1115 commented Jan 10, 2023 • edited Loading

anssiko commented Jan 10, 2023

anssiko commented Jan 10, 2023

wchao1115 commented Jan 10, 2023

anssiko commented Jan 12, 2023

wchao1115 commented Jan 13, 2023 • edited Loading

huningxin commented Jan 16, 2023 • edited Loading

anssiko left a comment

Choose a reason for hiding this comment

anssiko commented Feb 2, 2023

huningxin commented Feb 3, 2023 • edited Loading

anssiko commented Feb 7, 2023

pyu10055 commented Feb 7, 2023

wchao1115 commented Feb 7, 2023 • edited Loading

pyu10055 commented Feb 7, 2023

wchao1115 commented Feb 8, 2023 • edited Loading

pyu10055 commented Feb 8, 2023

wchao1115 commented Feb 8, 2023

anssiko commented Feb 9, 2023

wchao1115 commented Feb 9, 2023

wchao1115 commented Jan 27, 2024

wchao1115 commented Dec 26, 2022 •

edited by pr-preview bot

Loading

wchao1115 commented Dec 28, 2022 •

edited

Loading

wchao1115 commented Dec 29, 2022 •

edited

Loading

wchao1115 commented Jan 3, 2023 •

edited

Loading

wchao1115 commented Jan 10, 2023 •

edited

Loading

wchao1115 commented Jan 13, 2023 •

edited

Loading

huningxin commented Jan 16, 2023 •

edited

Loading

huningxin commented Feb 3, 2023 •

edited

Loading

wchao1115 commented Feb 7, 2023 •

edited

Loading

wchao1115 commented Feb 8, 2023 •

edited

Loading