Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify MLContext creation #322

Closed
wants to merge 5 commits into from
Closed

Simplify MLContext creation #322

wants to merge 5 commits into from

Conversation

wchao1115
Copy link
Collaborator

@wchao1115 wchao1115 commented Dec 26, 2022

Given the explicit support for the WebGPU gpuDevice as a dedicated ML context type in addition to the default type, and the recent removal of the support for the WebGLContext device type, the MLContext creation can be further simplified by removing the internal GPU device type that is not explicitly specified as a gpuDevice parameter to the ML context creation method i.e. a GPU context can now be created only from a WebGPU device.

The second change is the removal of the high-performance power preference. This option was originally defined to facilitate the creation of a GPU context that is focused on high performance, low latency execution of a discrete GPU (presumably as a tiebreaker in a configuration where both an integrated GPU and a dedicated discrete GPU are present). Following the rationale behind the first change to only support a GPU context thru a WebGPU device, this option is no longer necessary since the specified gpuDevice must be created from an adapter that must first be explicitly selected by the user of WebGPU i.e. it is now up to the user to choose which GPU adapter (e.g. integrated vs. discrete) that they want to use as a GPU device backing the MLContext object.

Lastly, please note that with this change, the default context is now an implementation choice. As long as the implementation takes reasonable consideration to the user-specified power preference option, it can implement the default context type any way they want to as long as they aren't a GPU implementation. For instance, the implementation can be entirely CPU-based or as a hybrid CPU-NPU implementation, as long as it fully supports ArrayBufferView as in the API contract, it's all good from the spec point of view. I've taken to remove explicit mentions of CPU as a normative requirement in the relevant wordings to make it sound more like an informative consideration.


Preview | Diff

@huningxin
Copy link
Contributor

i.e. a GPU context can now be created only from a WebGPU device.

I concern that this change would prevent an non-browser implementation, such as WebNN-native Node.js binding, from supporting GPU device.

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Dec 28, 2022

Thanks for the comment @huningxin. Non-browser implementation only supports default context by definition, and therefore it can implement the interface as it sees fit (as implied in the last paragraph in my PR description), including as a GPU implementation. I can expand on that wording a bit more if that helps. Removing GPU support from non-WebGPU device could really simplify the browser implementation, which I believe is the priority in our charter, and makes it clear how WebNN interop with WebGPU.

@huningxin
Copy link
Contributor

Thanks for the explanation, @wchao1115 .

The non-browser implementation is one aspect. Actually, I am trying to understand the impact to the default context developers.

Before this change, to utilize a GPU device, the developers can just set the device type to "gpu" when creating the default context. The "gpu" and "cpu" default contexts can share the same code for graph building and compute (e.g., MLContext.compute()).

However, after this change, to utilize a GPU device, the developers would need to use WebGPU API to obtain a GPUDevice and create a webgpu context with it. Although the graph building code may be reused with the default context, the graph compute code would be different, because the compute() method is not allowed in a webgpu context (the spec says this method throws "OperationError" DOMException if not a default context). In particular, developers need to create MLCommandEncoder and use its methods including initializeGraph(), dispatch() and finish() to compute a graph.

I feel this change would bring some extra overhead (WebGPU interop) to the default context developers for GPU access and reduce the code reusability across different types of devices for some use cases.

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Dec 29, 2022

With this change the only requirement of the default context is that it needs to support the array buffer view as constants, inputs, and outputs. As long as CPU resources are supported, the implementation can be either on the CPU or GPU side (i.e. with automatic upload/download for the GPU execution)

That said I'm not familiar with the importance of supporting non-browser implementation out of the same standard web API spec. Is that a requirement for WebNN? Do we have existing customers that require that webnn-native be supported as one of the publicly maintained WebNN implementations? My view is that we can still maintain webnn-native that supports both CPU and GPU with an extension e.g. retaining the device preference enum in the webnn-native API but outside of the WebNN spec, etc.

As for the OperationError failure cases for compute I think it's overkill. I don't recall why we need that. We should remove those restrictions.

@huningxin
Copy link
Contributor

With this change the only requirement of the default context is that it needs to support the array buffer view as constants, inputs, and outputs. As long as CPU resources are supported, the implementation can be either on the CPU or GPU side (i.e. with automatic upload/download for the GPU execution)

It's great that the default context implementation can still support GPU. However, with the removal of the device type and high-performance power preference, the default context developers lose the capability to request a GPU implementation.

That said I'm not familiar with the importance of supporting non-browser implementation out of the same standard web API spec. Is that a requirement for WebNN?

It is tracked by #142. I am not aware any existing customers of webnn-native node.js binding. I agree the non-browser implementation is complementary. My focus is the default context usage in web browsers.

As for the OperationError failure cases for compute I think it's overkill. I don't recall why we need that. We should remove those restrictions.

This was introduced by #257.

I think this makes sense because it makes a clear separation by default and webgpu contexts for the two major usages that are WebNN-standalone (default) and WebNN/WebGPU interop.

As I summarized in above PR review , the difference can be captured by the following table (revised by dropping the WebGL interop).

Context Type Creation Method Device Type Graph Execution Method Resource Type
default Create from an MLContextOptions CPU/GPU/NPU,
device selected by device and power preferences
compute() (async call for both main thread and worker)
computeSync() (sync call for worker only)
CPU resources: ArrayBufferView
webgpu Create from
a GPUDevice
GPU,
device selected by WebGPU API
MLCommandEncoder records an MLGraph into a GPUCommandBuffer that can be submitted to a GPUQueue GPU resources: GPUBuffer and GPUTexture

@wchao1115
Copy link
Collaborator Author

As I proposed in my earlier comment, are you open to leaving the device preference only for webnn-native and not officially on the WebNN spec? That way the webnn-native continues to function as is today while we simplify the contract for WebNN to benefit the browser implementation?

@huningxin
Copy link
Contributor

Thanks @wchao1115 , webnn-native should follow the spec as much as possible. It would be fine.

My concern is more about the impact to the default context developers, they are also targeting web browsers. For example, the TFLite WebNN delegate uses the default context. It allows developers to access GPU device by just setting device type and without any other code modifications. With this change, it has to create a webgpu context and introduce a different graph compute path with WebGPU interop.

@wchao1115
Copy link
Collaborator Author

Thanks. I understand your concern from the framework developers' point of view. However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device. Having multiple GPU devices to deal with creates huge complication around performance, and a lot of complications around resource sharing especially if the system has multiple adapters configured (which is a lot more common than most people think, with both Intel integrated and other discrete adapter plugged in).

It seems pretty clear that people wants WebGPU to be the way webapp deals with GPU going forward. We even dropped support for WebGL to make it more compelling to frameworks and apps to target WebGPU in the future. With that backdrop, having framework developers targeting WebGPU explicitly in the backend may create some short-term integration work but it's a small price to pay because you end up with a much more streamline GPU stack, with a more robust interop, and more efficient support for the apps down the road, which is a huge advantage that multiplies over time.

The current design will make efficient and robust WebGPU interop very hard to accomplish in practice because we intentionally fragment it at the API level.

@huningxin
Copy link
Contributor

@wchao1115

However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device.

I suppose this issue could be solved by the implementation that manages and shares the GPU device between the WebNN default context and WebGPU.

This was prototyped by my previous WebNN-on-Dawn Chromium PoC (proof-of-concept) that integrates the WebNN implementation into WebGPU Dawn project. It allows a WebNN context creates and owns a WebGPU/Dawn device internally. The WebGPU/Dawn device is used by the WebNN implementation to execute the command lists that represents the MLGraph.

Another proof is WebNN-polyfill. The default context polyfill implementation is able to run on top of WebGPU via TensorFlow.js WebGPU backend.

It seems pretty clear that people wants WebGPU to be the way webapp deals with GPU going forward.

I agreed. And that's exactly why I think WebGPU interop is important especially for the GPU rendering / real-time video processing use cases.

On the other hand, I still believe it is very compelling if the default context execution model, in particular MLContext.compute() with ArrayBufferViews resources, could easily target different hardware devices, including CPU, GPU and NPU, by simply setting a device type. I suppose this is widely adopted execution model by native frameworks / OS ML APIs as well.

index.bs Outdated Show resolved Hide resolved
@huningxin
Copy link
Contributor

@wchao1115

However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device.

This issue may not exist for some native ML APIs that run in a separated process of WebGPU, such as the ChromeOS MLService. The CG's Model Loader API implementation is based on MLService. The device type of default context is used for MLService to select a device. In particular, when creating a model loader, the implementation will get the device type from the context, pass the device type to MLService and the MLService will select a device used to do model inference. This infrastructure is also shared by the ongoing WebNN MLService backend prototype.

/cc @wacky6

@wacky6
Copy link

wacky6 commented Jan 3, 2023

However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device.

Is the concern about having multiple JS objects for a single GPU device, and potentially allocating 2x memory in GPU?.

Something like:

  1. Create default context (implementation chooses GPU). Developer wants to use this as fallback when there's GPU resource contention.
  2. Create GPU context (explicitly choose GPU)
  3. Model is duplicated in GPU memory

I don't see a particular reason to discourage this from API design perspective.


Note, ChromeOS MLService manages GPU/NPU resource independently of WebGPU process.

In an ideal world, developer don't need to care about device type, MLService will just pick one intelligently.

Some theoretical examples:

  • OS decide website engagement, system load, available resource
  • OS evicts (e.g. GPU -> CPU), throttles inference when system is underload
  • Decide based on model ops (e.g. TPU for matrix heavy models, GPU for convolution based models)

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Jan 3, 2023

I suppose this issue could be solved by the implementation that manages and shares the GPU device between the WebNN default context and WebGPU.

@huningxin I can see how this can be done on a system with a single GPU adapter. But a WebGPU gpuDevice is created from an adapter that is explicitly selected by the developer. So, in a multi-adapter configuration, the WebNN implementation may end up create a device different from what the WebGPU user wants.

Is the concern about having multiple JS objects for a single GPU device, and potentially allocating 2x memory in GPU?.

@wacky6 The concern is two-folds. First, the uncoordinated implementations between WebGPU and WebNN could end up creating duplicates of the same GPU device on the same adapter. This isn't just two JS objects pointing to the same underlying GPU system device objects (e.g. a D3D device in the Windows case) but two GPU device objects, each wrapped within its own JS API object. This would be quite wasteful from the OS point of view.

Secondly, if we assume that the implementations of the two APIs are tightly coordinated (which shall we say is currently still questionable, but lets assume that for the sake of this conversation), we still have a multi-adapter case to worry about. These are systems with two physical GPU adapters (e.g. an Intel integrated graphics with a discrete GPU adapter plugged in to one of the IO slots). In this settings, a WebGPU user would be responsible for selecting an adapter they want to use as the basis for the gpuDevice they will create (i.e. there is no default device in WebGPU, you create a device from a pre-selected adapter). There is no way for the WebNN implementation to know ahead of time which adapter the user would want their gpuDevice to be base on. And, if the default context simply picks the default adapter (almost always an integrated one) while the user picks a discrete adapter, you end up having two actual GPU devices with different resource domains lying around while their resources can't actually be shared across. So, it's not only wasteful but also not workable i.e. breaking apps.

@huningxin
Copy link
Contributor

@wchao1115

There is no way for the WebNN implementation to know ahead of time which adapter the user would want their gpuDevice to be base on.

That's true.

And, if the default context simply picks the default adapter (almost always an integrated one) while the user picks a discrete adapter, you end up having two actual GPU devices with different resource domains lying around while their resources can't actually be shared across.

This case seems not an issue because the default context compute() method would only take ArrayBufferView resources and won't need to interact with any WebGPU resources created on discrete GPU adapter.

And if developers can set the "high-performance" power preference, the WebNN default context may select the same discrete GPU adapter that is selected by WebGPU call explicitly before.

@huningxin
Copy link
Contributor

@wacky6

In an ideal world, developer don't need to care about device type, MLService will just pick one intelligently.

Do you mean the "auto" device preference proposal of Model Loader API?

@wacky6
Copy link

wacky6 commented Jan 3, 2023

Do you mean the "auto" device preference proposal of Model Loader API?

Yes. And for multi-GPU, using ["gpu"] picks the best available one (e.g. prefers discrete over integrated).

@wchao1115
Copy link
Collaborator Author

@wacky6

Yes. And for multi-GPU, using ["gpu"] picks the best available one (e.g. prefers discrete over integrated).

The definition of "best" varies depending on what the app wants. Discrete GPU consumes a lot of power. It may not be ideal to run a model with low compute requirements where using an integrated GPU could preserve battery life in a mobile situation.

@wchao1115
Copy link
Collaborator Author

@huningxin

This case seems not an issue because the default context compute() method would only take ArrayBufferView resources and won't need to interact with any WebGPU resources created on discrete GPU adapter.

My point was that in this situation sharing a piece of content between the two API would then become a pair of copies for data download/upload, which is highly inefficient.

By making a WebGPU device the only GPU device WebNN operates on, we make the API efficient by default. zero-copy by default.

The reasoning behind this proposal is to simplify and make things more efficient by default. It isn't because something is functionally broken. My chief concern is that the current API provides too many options that could easily lead to inefficient implementation and interop with WebGPU. It seems far cleaner to coalesce around WebGPU device for GPU support of WebNN provided that efficient interop with WebGPU is highly desirable. I actually believe this proposal will help bridge the implementation gap on the GPU side with the Chromium GPU folks.

As a concrete sample data point, one strong hint of a redundant design is that there are both the MLPowerPreference defined on WebNN and the GPUPowerPreference on WebGPU. The two enums are basically doing the exact same thing for GPU adapter selection.

@huningxin
Copy link
Contributor

@wchao1115

My point was that in this situation sharing a piece of content between the two API would then become a pair of copies for data download/upload, which is highly inefficient.

That's true. This is a WebNN/WebGPU interop scenario. Developers should create a webgpu context and use MLCommandEncoder for WebGPU interop.

I feel WebNN spec should allow the implementation to pick a GPU device if developers just want to use GPU but don't care about WebGPU interop or multi-adapters etc.,.

As a concrete sample data point, one strong hint of a redundant design is that there are both the MLPowerPreference defined on WebNN and the GPUPowerPreference on WebGPU.

Modern CPUs may also employ hybrid cores, such as performance-cores and efficient-cores or big.LITTLE cores, to balance power and performance. The MLPowerPreference may also help select the cores to run the workloads on these CPU architectures.

@wchao1115
Copy link
Collaborator Author

I feel WebNN spec should allow the implementation to pick a GPU device if developers just want to use GPU but don't care about WebGPU interop or multi-adapters etc.,.

In practice, the GPU implementation will pick the GPU adapter for the device based on the MLPowerPreference. The same logic is used when a WebGPU picks an adapter based on GPUPowerPreference. By letting the app developer selects the adapter themselves through WebGPU, we reduce the redundancy and remove the ambiguity of having two adapters both active but known only by one of the two API without efficient interoperability.

Even though it may seem that we limit WebNN from being able to manage its own GPU device, in practice no functional capability is being removed here. The developer can still achieve the same outcome as what they can do now but doing so in a more direct way. WebNN not managing a GPU device lifetime is a good thing from an implementation standpoint.

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchao1115 thanks! I took a look and made some minor suggestions. I think this PR benefits from a WG discussion so I plan to bring this to our next call agenda to solicit more feedback and review.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@wchao1115
Copy link
Collaborator Author

wchao1115 commented Jan 10, 2023

@wchao1115 thanks! I took a look and made some minor suggestions. I think this PR benefits from a WG discussion so I plan to bring this to our next call agenda to solicit more feedback and review.

Thanks. How should we handle existing occasional typos and misspelled words that are irrelevant to the change? It just makes the merge harder. Ideally, I think we should have a round of "scrub only" changes that does not materially impact the rest of the doc content.

@anssiko
Copy link
Member

anssiko commented Jan 10, 2023

It is fine to fix typos if those typos were introduced by the changes in this PR. For other changes, a separate PR keeps us focused.

@anssiko
Copy link
Member

anssiko commented Jan 10, 2023

We should check whether these proposed changes suggest changes to the performance adaptation use case. It is generally recommended to maintain use cases alongside the API as it evolves. In the spirit of atomic PRs any use case changes motivated by API changes should be incorporated into the same PR. If that’d complicate the concrete work, the editors are free to work on these changes in separate PRs and cross-link the two.

Related, there has been a proposal #207 to update the performance adaptation use case in the past. The WG considered that update future work given no accompanying API changes were suggested at that time.

@wchao1115
Copy link
Collaborator Author

@anssiko This change doesn't affect that use case as it doesn't remove any current capability i.e. the developer can still use WebNN to direct the execution of her model to the CPU via the default context.

@anssiko
Copy link
Member

anssiko commented Jan 12, 2023

To ease the review: the proposed IDL changes in this PR are as follows (excluding MLOperator -> MLActivation rename changes already landed in #321):

Navigator includes NavigatorML;
 WorkerNavigator includes NavigatorML;
 
-enum MLDeviceType {
-  "cpu",
-  "gpu"
-};
-
 enum MLPowerPreference {
   "default",
-  "high-performance",
   "low-power"
 };
 
 dictionary MLContextOptions {
-  MLDeviceType deviceType = "cpu";
   MLPowerPreference powerPreference = "default";
 };
 
 [SecureContext, Exposed=(Window, DedicatedWorker)]
 interface ML {
+  // Default context
   Promise<MLContext> createContext(optional MLContextOptions options = {});
-  Promise<MLContext> createContext(GPUDevice gpuDevice);
 
   [Exposed=(DedicatedWorker)]
   MLContext createContextSync(optional MLContextOptions options = {});
+
+  // GPU context
+  Promise<MLContext> createContext(GPUDevice gpuDevice);
+
   [Exposed=(DedicatedWorker)]
   MLContext createContextSync(GPUDevice gpuDevice);
 };

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Jan 13, 2023

Rebase on main and remove restrictive text that prevents compute and computeSync to work with a WebGPU context per WG feedback on 1/13. @huningxin please take a look.

@huningxin
Copy link
Contributor

huningxin commented Jan 16, 2023

@wchao1115

Rebase on main and remove restrictive text that prevents compute and computeSync to work with a WebGPU context per WG feedback on 1/13. @huningxin please take a look.

Thanks for the update. It resolved the corresponding conversation.

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments have been addressed, thanks!

We should wait for @huningxin to be back from his well-deserved holiday to confirm his feedback and comments has been addressed before we proceed with this PR.

@pyu10055, given the impact of this proposed change to JS frameworks, feedback from TF.js would be invaluable. Please see my compact diff at #322 (comment) if this PR looks tl;dr. All comments and questions welcome.

(As a reminder, this was discussed on our last call.)

@anssiko
Copy link
Member

anssiko commented Feb 2, 2023

We discussed this on our call https://www.w3.org/2023/02/02-webmachinelearning-minutes.html today.

Here's a summary and proposed next steps:

This feature introduces a normative dependency on the WebGPU API for the GPU path. We discussed how maturization of this important feature would greatly benefit from tight coordination with WebGPU contributors, implementation experience and feedback from key customers such as JS ML framework. We also discussed pros and cons and logistics of landing this feature in a revised CR vs. in the initial CR. The upcoming WebGPU F2F 2023-02-16/17 in the SF bay area would be a great opportunity to socialise and seek feedback on the WebNN-WebGPU interoperability design overall.

@wchao1115 would you be interested in attending that F2F (virtually) and sharing the WebNN-WebGPU interop design at the meeting? @huningxin @RafaelCintron may want to contribute too.

@pyu10055, given the impact of this proposed change to JS ML frameworks, feedback from TF.js would be invaluable. Please see my compact diff at #322 (comment) if this PR looks tl;dr. All comments and questions welcome.

@huningxin
Copy link
Contributor

huningxin commented Feb 3, 2023

@anssiko

We discussed this on our call https://www.w3.org/2023/02/02-webmachinelearning-minutes.html today.

I think we also would like to hear feedback from cross-platform support perspective.

@wacky6 , do you see any gaps to support this change (compact diff at #322 (comment)) on WebNN MLService implementation. The current prototype passes the device enum type , say gpu, to MLService to select GPU. After this change, it may pass the GPUDevice object to MLService. I am not sure whether it can be handled.

And this change may also impact Model Loader API because it is sharing the MLContext interface with WebNN. Any thoughts from Model Loader's point of view?

@anssiko
Copy link
Member

anssiko commented Feb 7, 2023

@mattsoulanille from the TensorFlow.js team may be able to provide feedback from the ML JS framework perspective on this proposed change.

@pyu10055
Copy link
Collaborator

pyu10055 commented Feb 7, 2023

Sorry for being late to this discussion, I like @wchao1115's proposal that enables WebNN expose low level implementation details (WebGPU) to the API to enable high level framework to provide more efficient integration, and better ML pipeline. At the same time, I do agree with @huningxin that less experienced users who would prefer a semantically meaningful API, I think this would improve the usability of the API and attract more adoptions.

I would selfishly prefer having both APIs to address needs of users from different spectrum.

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Feb 7, 2023

@anssiko I have no strong opinion on whether this change should be incorporated as part of the upcoming CR or later. I think that's entirely logistical. We should do what enables us to expedite the CR review process, which I'm not an expert of.

As for the change itself, this line of summary worries me a bit:

This feature introduces a normative dependency on the WebGPU API for the GPU path.

I'd like to clarify that the spec already takes this normative dependency on WebGPU today without this change. If anything, this change should simplify the implementation of this dependency in practice, as it consolidates GPU support around just one code path with a WebGPU device.

@pyu10055 I do not consider the WebGPU dependency a low-level implementation detail. Again, this dependency already exists at the API level before this change i.e. WebNN already allows the caller to instantiate an MLContext from a specified GPUDevice.

Given that this change is about removing the support for an internal GPU device option (i.e. the MLDeviceType == "gpu" use case), one of the core arguments against it is that it'll make it harder for the caller of WebNN (e.g. TF.js) to support GPU execution (as it removes an "easier" way of doing it). This argument is based on a wrong assumption that supporting the GPU would require nothing other than an enum value or a Boolean option from the caller. Consider the case where the configuration of the client machine includes multiple GPU adapters for example, which adapter should be used for WebNN execution? Does the framework like TF.js have enough information to decide on behalf of its caller? What if that decision is either incorrect or must be changed in the future?

How much should we care about this multi-adapter systems? i.e. how many client devices out there with multiple adapters in it? It turns out that roughly 1/3 of all the PC out there are equipped with a discrete GPU adapter, and every one of those machines also comes with an Intel integrated graphics.

Allowing the caller to specify precisely which GPUDevice (from which GPU adapter exactly) they would want to run inference on mitigates this dangerous ambiguity that could really lead to a very different outcome for the users. One data point today is with ONNX Runtime i.e. ORT does not assume which GPU adapter the caller wants to run on, it relies on the caller to choose explicitly. I think every ML framework that supports the GPU would very likely need to make a similar design choice.

@pyu10055
Copy link
Collaborator

pyu10055 commented Feb 7, 2023

@wchao1115 I might have not explained myself clearly, I think it is a good thing to framework like TFJS to be able to specify or pass down the gpudevice to WebNN. The question is whether a default device can/should be specified through semantic API (high-performance/low power/etc). I think this is useful, most likely, the end users or framework might not have the understanding or low level hardware access to make a better decision.

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Feb 8, 2023

@pyu10055 If what you mean is through the MLPowerPreference options, please also take a look at GPUPowerPreference. As you can see, they are essentially identical. To select a GPU adapter, the app only needs to make 1 call to GPU.RequestAdapter with a specified GPUPowerPreference "hint" value. It isn't hard to do, can be done easily by the app.

But if you think you would rather surface a similar adapter hint to the app of TF.js, that call to requestAdapter can also be made by TF.js on behalf of the app who passes in the adapter hint. In either case, it can be done above WebNN.

WebNN is intentionally designed to be a backend API, and not really as a framework API. Having it directly accepting the device corresponding to the desired adapter removes ambiguity and the burden of maintaining compatibility in the future.

@pyu10055
Copy link
Collaborator

pyu10055 commented Feb 8, 2023

@wchao1115 Using WebGPU device makes sense if the API allows frameworks or users to specifically create a WebGPU backend. But if the intention of WebNN GPU backend is transparent GPU acceleration, I think WebNN should have an level of abstraction on GPU, exposing WebGPU objects as part of WebNN API will add burdens to non-WebGPU implementations.

@wchao1115
Copy link
Collaborator Author

I think WebNN should have an level of abstraction on GPU, exposing WebGPU objects as part of WebNN API will add burdens to non-WebGPU implementations.

This is already the case today even with the default GPU option i.e. if you intend to run a model on the GPU with an input bound to a GPU resource, that GPU resource must already be a WebGPU resource. Moreover, it must be a WebGPU resource created from the same WebGPU device that WebNN internally manages. This is the reason why WebNN can't really "hide" the underlying WebGPU device from the caller. It's much more natural and robust to have the WebNN caller specifies both what device it wants to run on, and what resource from the same device that it wants to bind the model's input to.

@anssiko
Copy link
Member

anssiko commented Feb 9, 2023

I have no strong opinion on whether this change should be incorporated as part of the upcoming CR or later. I think that's entirely logistical. We should do what enables us to expedite the CR review process, which I'm not an expert of.

@wchao1115 here's the proposed "status of this document" text that will enable us to expedite the CR process and solicit wider feedback and review on this important design consideration: #340

@wchao1115
Copy link
Collaborator Author

@anssiko I've commented on #340. I don't think this PR is related to WebGPU interop. As has been explained in this thread, this PR is about removing the internal GPU option in MLContext. WebGPU interop requirement already exists prior to this proposed change.

@wchao1115
Copy link
Collaborator Author

Retract until consensus can be found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants