-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify MLContext creation #322
Conversation
I concern that this change would prevent an non-browser implementation, such as WebNN-native Node.js binding, from supporting GPU device. |
Thanks for the comment @huningxin. Non-browser implementation only supports default context by definition, and therefore it can implement the interface as it sees fit (as implied in the last paragraph in my PR description), including as a GPU implementation. I can expand on that wording a bit more if that helps. Removing GPU support from non-WebGPU device could really simplify the browser implementation, which I believe is the priority in our charter, and makes it clear how WebNN interop with WebGPU. |
Thanks for the explanation, @wchao1115 . The non-browser implementation is one aspect. Actually, I am trying to understand the impact to the default context developers. Before this change, to utilize a GPU device, the developers can just set the device type to "gpu" when creating the default context. The "gpu" and "cpu" default contexts can share the same code for graph building and compute (e.g., However, after this change, to utilize a GPU device, the developers would need to use WebGPU API to obtain a I feel this change would bring some extra overhead (WebGPU interop) to the default context developers for GPU access and reduce the code reusability across different types of devices for some use cases. |
With this change the only requirement of the default context is that it needs to support the array buffer view as constants, inputs, and outputs. As long as CPU resources are supported, the implementation can be either on the CPU or GPU side (i.e. with automatic upload/download for the GPU execution) That said I'm not familiar with the importance of supporting non-browser implementation out of the same standard web API spec. Is that a requirement for WebNN? Do we have existing customers that require that webnn-native be supported as one of the publicly maintained WebNN implementations? My view is that we can still maintain webnn-native that supports both CPU and GPU with an extension e.g. retaining the device preference enum in the webnn-native API but outside of the WebNN spec, etc. As for the OperationError failure cases for |
It's great that the default context implementation can still support GPU. However, with the removal of the device type and high-performance power preference, the default context developers lose the capability to request a GPU implementation.
It is tracked by #142. I am not aware any existing customers of webnn-native node.js binding. I agree the non-browser implementation is complementary. My focus is the default context usage in web browsers.
This was introduced by #257. I think this makes sense because it makes a clear separation by default and webgpu contexts for the two major usages that are WebNN-standalone (default) and WebNN/WebGPU interop. As I summarized in above PR review , the difference can be captured by the following table (revised by dropping the WebGL interop).
|
As I proposed in my earlier comment, are you open to leaving the device preference only for webnn-native and not officially on the WebNN spec? That way the webnn-native continues to function as is today while we simplify the contract for WebNN to benefit the browser implementation? |
Thanks @wchao1115 , webnn-native should follow the spec as much as possible. It would be fine. My concern is more about the impact to the default context developers, they are also targeting web browsers. For example, the TFLite WebNN delegate uses the default context. It allows developers to access GPU device by just setting device type and without any other code modifications. With this change, it has to create a webgpu context and introduce a different graph compute path with WebGPU interop. |
Thanks. I understand your concern from the framework developers' point of view. However, that use case in practice will encourage multiple devices created in the same process if the app also uses WebGPU, with one internally owned by the default context and the other explicitly by the app thru a WebGPU device. Having multiple GPU devices to deal with creates huge complication around performance, and a lot of complications around resource sharing especially if the system has multiple adapters configured (which is a lot more common than most people think, with both Intel integrated and other discrete adapter plugged in). It seems pretty clear that people wants WebGPU to be the way webapp deals with GPU going forward. We even dropped support for WebGL to make it more compelling to frameworks and apps to target WebGPU in the future. With that backdrop, having framework developers targeting WebGPU explicitly in the backend may create some short-term integration work but it's a small price to pay because you end up with a much more streamline GPU stack, with a more robust interop, and more efficient support for the apps down the road, which is a huge advantage that multiplies over time. The current design will make efficient and robust WebGPU interop very hard to accomplish in practice because we intentionally fragment it at the API level. |
I suppose this issue could be solved by the implementation that manages and shares the GPU device between the WebNN default context and WebGPU. This was prototyped by my previous WebNN-on-Dawn Chromium PoC (proof-of-concept) that integrates the WebNN implementation into WebGPU Dawn project. It allows a WebNN context creates and owns a WebGPU/Dawn device internally. The WebGPU/Dawn device is used by the WebNN implementation to execute the command lists that represents the MLGraph. Another proof is WebNN-polyfill. The default context polyfill implementation is able to run on top of WebGPU via TensorFlow.js WebGPU backend.
I agreed. And that's exactly why I think WebGPU interop is important especially for the GPU rendering / real-time video processing use cases. On the other hand, I still believe it is very compelling if the default context execution model, in particular |
This issue may not exist for some native ML APIs that run in a separated process of WebGPU, such as the ChromeOS MLService. The CG's Model Loader API implementation is based on MLService. The device type of default context is used for MLService to select a device. In particular, when creating a model loader, the implementation will get the device type from the context, pass the device type to MLService and the MLService will select a device used to do model inference. This infrastructure is also shared by the ongoing WebNN MLService backend prototype. /cc @wacky6 |
Is the concern about having multiple JS objects for a single GPU device, and potentially allocating 2x memory in GPU?. Something like:
I don't see a particular reason to discourage this from API design perspective. Note, ChromeOS MLService manages GPU/NPU resource independently of WebGPU process. In an ideal world, developer don't need to care about device type, MLService will just pick one intelligently. Some theoretical examples:
|
@huningxin I can see how this can be done on a system with a single GPU adapter. But a WebGPU
@wacky6 The concern is two-folds. First, the uncoordinated implementations between WebGPU and WebNN could end up creating duplicates of the same GPU device on the same adapter. This isn't just two JS objects pointing to the same underlying GPU system device objects (e.g. a D3D device in the Windows case) but two GPU device objects, each wrapped within its own JS API object. This would be quite wasteful from the OS point of view. Secondly, if we assume that the implementations of the two APIs are tightly coordinated (which shall we say is currently still questionable, but lets assume that for the sake of this conversation), we still have a multi-adapter case to worry about. These are systems with two physical GPU adapters (e.g. an Intel integrated graphics with a discrete GPU adapter plugged in to one of the IO slots). In this settings, a WebGPU user would be responsible for selecting an adapter they want to use as the basis for the |
That's true.
This case seems not an issue because the default context And if developers can set the "high-performance" power preference, the WebNN default context may select the same discrete GPU adapter that is selected by WebGPU call explicitly before. |
Yes. And for multi-GPU, using ["gpu"] picks the best available one (e.g. prefers discrete over integrated). |
The definition of "best" varies depending on what the app wants. Discrete GPU consumes a lot of power. It may not be ideal to run a model with low compute requirements where using an integrated GPU could preserve battery life in a mobile situation. |
My point was that in this situation sharing a piece of content between the two API would then become a pair of copies for data download/upload, which is highly inefficient. By making a WebGPU device the only GPU device WebNN operates on, we make the API efficient by default. zero-copy by default. The reasoning behind this proposal is to simplify and make things more efficient by default. It isn't because something is functionally broken. My chief concern is that the current API provides too many options that could easily lead to inefficient implementation and interop with WebGPU. It seems far cleaner to coalesce around WebGPU device for GPU support of WebNN provided that efficient interop with WebGPU is highly desirable. I actually believe this proposal will help bridge the implementation gap on the GPU side with the Chromium GPU folks. As a concrete sample data point, one strong hint of a redundant design is that there are both the |
That's true. This is a WebNN/WebGPU interop scenario. Developers should create a webgpu context and use I feel WebNN spec should allow the implementation to pick a GPU device if developers just want to use GPU but don't care about WebGPU interop or multi-adapters etc.,.
Modern CPUs may also employ hybrid cores, such as performance-cores and efficient-cores or big.LITTLE cores, to balance power and performance. The |
In practice, the GPU implementation will pick the GPU adapter for the device based on the Even though it may seem that we limit WebNN from being able to manage its own GPU device, in practice no functional capability is being removed here. The developer can still achieve the same outcome as what they can do now but doing so in a more direct way. WebNN not managing a GPU device lifetime is a good thing from an implementation standpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchao1115 thanks! I took a look and made some minor suggestions. I think this PR benefits from a WG discussion so I plan to bring this to our next call agenda to solicit more feedback and review.
Thanks. How should we handle existing occasional typos and misspelled words that are irrelevant to the change? It just makes the merge harder. Ideally, I think we should have a round of "scrub only" changes that does not materially impact the rest of the doc content. |
It is fine to fix typos if those typos were introduced by the changes in this PR. For other changes, a separate PR keeps us focused. |
We should check whether these proposed changes suggest changes to the performance adaptation use case. It is generally recommended to maintain use cases alongside the API as it evolves. In the spirit of atomic PRs any use case changes motivated by API changes should be incorporated into the same PR. If that’d complicate the concrete work, the editors are free to work on these changes in separate PRs and cross-link the two. Related, there has been a proposal #207 to update the performance adaptation use case in the past. The WG considered that update future work given no accompanying API changes were suggested at that time. |
@anssiko This change doesn't affect that use case as it doesn't remove any current capability i.e. the developer can still use WebNN to direct the execution of her model to the CPU via the default context. |
To ease the review: the proposed IDL changes in this PR are as follows (excluding Navigator includes NavigatorML;
WorkerNavigator includes NavigatorML;
-enum MLDeviceType {
- "cpu",
- "gpu"
-};
-
enum MLPowerPreference {
"default",
- "high-performance",
"low-power"
};
dictionary MLContextOptions {
- MLDeviceType deviceType = "cpu";
MLPowerPreference powerPreference = "default";
};
[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
+ // Default context
Promise<MLContext> createContext(optional MLContextOptions options = {});
- Promise<MLContext> createContext(GPUDevice gpuDevice);
[Exposed=(DedicatedWorker)]
MLContext createContextSync(optional MLContextOptions options = {});
+
+ // GPU context
+ Promise<MLContext> createContext(GPUDevice gpuDevice);
+
[Exposed=(DedicatedWorker)]
MLContext createContextSync(GPUDevice gpuDevice);
}; |
…be used with a WebGPU-device-based GPU context.
f95b30d
to
fde3783
Compare
Rebase on main and remove restrictive text that prevents |
Thanks for the update. It resolved the corresponding conversation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments have been addressed, thanks!
We should wait for @huningxin to be back from his well-deserved holiday to confirm his feedback and comments has been addressed before we proceed with this PR.
@pyu10055, given the impact of this proposed change to JS frameworks, feedback from TF.js would be invaluable. Please see my compact diff at #322 (comment) if this PR looks tl;dr. All comments and questions welcome.
(As a reminder, this was discussed on our last call.)
We discussed this on our call https://www.w3.org/2023/02/02-webmachinelearning-minutes.html today. Here's a summary and proposed next steps: This feature introduces a normative dependency on the WebGPU API for the GPU path. We discussed how maturization of this important feature would greatly benefit from tight coordination with WebGPU contributors, implementation experience and feedback from key customers such as JS ML framework. We also discussed pros and cons and logistics of landing this feature in a revised CR vs. in the initial CR. The upcoming WebGPU F2F 2023-02-16/17 in the SF bay area would be a great opportunity to socialise and seek feedback on the WebNN-WebGPU interoperability design overall. @wchao1115 would you be interested in attending that F2F (virtually) and sharing the WebNN-WebGPU interop design at the meeting? @huningxin @RafaelCintron may want to contribute too. @pyu10055, given the impact of this proposed change to JS ML frameworks, feedback from TF.js would be invaluable. Please see my compact diff at #322 (comment) if this PR looks tl;dr. All comments and questions welcome. |
I think we also would like to hear feedback from cross-platform support perspective. @wacky6 , do you see any gaps to support this change (compact diff at #322 (comment)) on WebNN MLService implementation. The current prototype passes the device enum type , say And this change may also impact Model Loader API because it is sharing the |
@mattsoulanille from the TensorFlow.js team may be able to provide feedback from the ML JS framework perspective on this proposed change. |
Sorry for being late to this discussion, I like @wchao1115's proposal that enables WebNN expose low level implementation details (WebGPU) to the API to enable high level framework to provide more efficient integration, and better ML pipeline. At the same time, I do agree with @huningxin that less experienced users who would prefer a semantically meaningful API, I think this would improve the usability of the API and attract more adoptions. I would selfishly prefer having both APIs to address needs of users from different spectrum. |
@anssiko I have no strong opinion on whether this change should be incorporated as part of the upcoming CR or later. I think that's entirely logistical. We should do what enables us to expedite the CR review process, which I'm not an expert of. As for the change itself, this line of summary worries me a bit:
I'd like to clarify that the spec already takes this normative dependency on WebGPU today without this change. If anything, this change should simplify the implementation of this dependency in practice, as it consolidates GPU support around just one code path with a WebGPU device. @pyu10055 I do not consider the WebGPU dependency a low-level implementation detail. Again, this dependency already exists at the API level before this change i.e. WebNN already allows the caller to instantiate an Given that this change is about removing the support for an internal GPU device option (i.e. the How much should we care about this multi-adapter systems? i.e. how many client devices out there with multiple adapters in it? It turns out that roughly 1/3 of all the PC out there are equipped with a discrete GPU adapter, and every one of those machines also comes with an Intel integrated graphics. Allowing the caller to specify precisely which |
@wchao1115 I might have not explained myself clearly, I think it is a good thing to framework like TFJS to be able to specify or pass down the gpudevice to WebNN. The question is whether a default device can/should be specified through semantic API (high-performance/low power/etc). I think this is useful, most likely, the end users or framework might not have the understanding or low level hardware access to make a better decision. |
@pyu10055 If what you mean is through the But if you think you would rather surface a similar adapter hint to the app of TF.js, that call to requestAdapter can also be made by TF.js on behalf of the app who passes in the adapter hint. In either case, it can be done above WebNN. WebNN is intentionally designed to be a backend API, and not really as a framework API. Having it directly accepting the device corresponding to the desired adapter removes ambiguity and the burden of maintaining compatibility in the future. |
@wchao1115 Using WebGPU device makes sense if the API allows frameworks or users to specifically create a WebGPU backend. But if the intention of WebNN GPU backend is transparent GPU acceleration, I think WebNN should have an level of abstraction on GPU, exposing WebGPU objects as part of WebNN API will add burdens to non-WebGPU implementations. |
This is already the case today even with the default GPU option i.e. if you intend to run a model on the GPU with an input bound to a GPU resource, that GPU resource must already be a WebGPU resource. Moreover, it must be a WebGPU resource created from the same WebGPU device that WebNN internally manages. This is the reason why WebNN can't really "hide" the underlying WebGPU device from the caller. It's much more natural and robust to have the WebNN caller specifies both what device it wants to run on, and what resource from the same device that it wants to bind the model's input to. |
@wchao1115 here's the proposed "status of this document" text that will enable us to expedite the CR process and solicit wider feedback and review on this important design consideration: #340 |
Retract until consensus can be found. |
Given the explicit support for the WebGPU
gpuDevice
as a dedicated ML context type in addition to the default type, and the recent removal of the support for the WebGLContext device type, theMLContext
creation can be further simplified by removing the internal GPU device type that is not explicitly specified as agpuDevice
parameter to the ML context creation method i.e. a GPU context can now be created only from a WebGPU device.The second change is the removal of the
high-performance
power preference. This option was originally defined to facilitate the creation of a GPU context that is focused on high performance, low latency execution of a discrete GPU (presumably as a tiebreaker in a configuration where both an integrated GPU and a dedicated discrete GPU are present). Following the rationale behind the first change to only support a GPU context thru a WebGPU device, this option is no longer necessary since the specifiedgpuDevice
must be created from an adapter that must first be explicitly selected by the user of WebGPU i.e. it is now up to the user to choose which GPU adapter (e.g. integrated vs. discrete) that they want to use as a GPU device backing theMLContext
object.Lastly, please note that with this change, the default context is now an implementation choice. As long as the implementation takes reasonable consideration to the user-specified power preference option, it can implement the default context type any way they want to as long as they aren't a GPU implementation. For instance, the implementation can be entirely CPU-based or as a hybrid CPU-NPU implementation, as long as it fully supports
ArrayBufferView
as in the API contract, it's all good from the spec point of view. I've taken to remove explicit mentions of CPU as a normative requirement in the relevant wordings to make it sound more like an informative consideration.Preview | Diff