-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for device-based tensor storage objects #482
Comments
@bbernhar I'm still not quite clear on the problem statement. Can you please clarify on what we think is the problem here? |
@wchao1115 Sure. WebNN (as spec'd) and WebGPU lack a way of sharing tensor data on-device directly with each other: |
I think this feature would be critical for some language models' performance on device (GPU/NPU) where the outputs of the previous inference, e.g., hidden state or KV pairs, will be used as inputs of the next inference. For such use case, frameworks usually allow to allocate on-device tensors and use these tensors as inputs/outputs for model inference, for example ONNXRuntime I/O Binding. |
Since an If so, the read/write operations could be simplified by moving them onto the interface rather than needing to pass the buffer. |
Nitpick on proposed API shape: |
Thanks @inexorabletash for the feedback. Read/write ops must occur in the domain of the context because only the context determines device-execution order, not |
My thoughts, too, but MLContext is indeed the main encapsulator here, and IIUC |
Ah, thank you for clarifying. The timeline mentions are subtle, I hadn't internalized that yet. One more thing to specify concretely in the spec. :) FWIW, it's best to make API proposals by leading with examples of the JS code using the proposed API, and only worry about providing the IDL later. |
Unfortunately, no code example tells you which timeline gets used where, only the WebNN spec can describe this behavior: which state is available to which operations. The WebNN programming model should probably concretely define these "timelines" then describe the entire API using it. |
Hi @bbernhar, thanks for this proposal (and the Chromium prototype)! I agree that specifying a clear WebNN <-> WebGPU interop is needed. I have a comment and a few of questions for ya
This proposal includes a clear way to read back to JS/CPU using In particular, I'm curious about:
As @inexorabletash mentioned, code snippets showing WebGPU <-> WebNN interop would be extremely helpful here :)
Are we expecting to provide any guarantees about where this buffer resides? Would an We need to also take into consideration other platforms where ML execution is not so closely tied to a single "device" as DirectML is (e.g. Core ML on Mac - this is related to discussions in #322 around
Just a heads up that with JSPI coming soon, I would expect pushback on adding this sync interface, even in a worker :) |
Some questions: How to transfer MLBuffer between devices?I think Perhaps a bit future looking, how do we support GPU <-> NPU/GPU transfers? (e.g. GPU/NPU cooperation, iGPU <-> dGPU, multi-GPU) From the current design,looks like developers need to:
Is there a faster path (or do we anticipate one) for inter-device transfer? Can we use Intel GPU <-> NPU transfer as an example? Simplified types in compute()Should we change MLBuffer usage scopeIs MLBuffer only permitted for binding input/output buffers to a built-graph during compute? Can MLBuffer be used where a MLOperand is accepted, like in read/writeBuffer definitionShould these be defined on the MLBuffer themselves? Looks like read/write operations is dependent on the context the MLBuffer is associated with. Defining read/write on MLBuffer removes the need to check MLBuffer memory managementWhen is MLBuffer's memory allocated on device? Is it during writeBuffer? Should MLBuffer.destroy() returns a Promise to tell the caller that the memory has been deallocated? I also wonder if the continuous memory model is too simplified. What is different device use different channel ordering or endianness? Are we expecting developers to perform said conversion on CPU manually? |
Thanks, @a-sully for raising these questions.
If we allow Then I think adding a new WebGPU API, That code could look like this: // Create sharable buffer in WebNN
ml_context = ML.createContext(wgpuDevice);
ml_buffer = ml_context.createBuffer({size:size, forExport:true});
// Import buffer to WebGPU
gpu_buffer = wgpuDevice.importExternalBuffer(ml_buffer);
pipeline = wgpuDevice.createComputePipeline(/* pipeline with compute shader that updates gpu_buffer */);
bind_group = wgpuDevice.createBindGroup(/* create bind group for gpu_buffer */);
command_encoder = wgpuDevice.createCommandEncoder();
pass = command_encoder.beginComputePass();
pass.setPipeline(pipeline);
pass.setBindGroup(/* buffer index in shader */, bind_group);
pass.dispatchWorkgroups(/* sizes */);
pass.end();
wgpuDevice.queue.submit([command_encoder.finish()]);
// Export buffer from WebGPU
ml_buffer = ml_context.importExternalBuffer(gpu_buffer)
Yes, it will reside on the same device used to create the MLContext. If it is a CPU-only ML context, then WebNN should create a CPU backed MLBuffer.
Right. If you are on the same GPU, the result allows zero-copy. Otherwise, a GPU copy is usually required for NPU-to-GPU or iGPU-to-dGPU or video-to-tensor conversions.
Thanks for the heads up. |
@wacky6 Great questions, my thoughts below.
The only true "no copy" path I'm aware of is CPU/iGPU. I believe all other scenarios require GPU/NPU copy.
I am also in favor of using
Currently, yes. In the future,
Perhaps the earlier response addresses this? #482 (comment).
No, it would be on buffer creation. This avoids generating a fatal OOM where the WebNN developer wouldn't expect.
I will follow up with our Intel NPU teams if they plan to introduce complex formats. |
@wacky6 and @a-sully , thank you for your feedback. wacky6 wrote:
In the current proposal, there is no "block until completion" on the JS side for steps 1 or 2. After developers call The proposal does not talk about WebGPU/WebNN interop but I agree with Bryan about having an FYI, WebGPU already has a similar relationship with other web APIs such as video frames. See Importing External Textures. wacky6 wrote:
As Bryan says wacky6 wrote:
I would prefer that we keep read/writeBuffer on the context so that it is more clear to web developers that those operations are queued relative to dispatch operations. WebGPU works in a similar manner. See GPUQueue wacky6 wrote:
I agree with Bryan the memory should be allocated when the buffer is created. Both WebGPU and WebGL have similar |
@bbernhar and @RafaelCintron thanks for the explanations! The code example is very helpful TLDR I'd like to raise some issues which I think are necessary to resolve before this proposal can move forward. I don't have any concrete proposals since I'm still learning this area, but I would appreciate confirmation that the raised issues do need to be tackled. I'm also very happy to help work these out together :)
I believe that if we're to go forward with
With regards to (1) let's look at a snippet from the example above:
Presumably this code does not suggest that we are synchronously mapping/copying the // `gpuBuffer` is used in some WebGPU work submitted here
wgpuDevice.queue.submit([commandEncoder.finish()]);
// Inform WebGPU to map/copy `gpuBuffer` to `mlBuffer` once
// `gpuBuffer`'s contents are ready to be accessed.
const mlBuffer = mlContext.importExternalBuffer(gpuBuffer);
// Queue this work behind the importExternalBuffer() call on a WebNN timeline.
// This implicitly awaits all WebGPU work involving `gpuBuffer`
const gpuBufferContentsCopiedToJsBuffer = await mlContext.readBuffer(mlBuffer); Note that What's actually happening here? How can the user agent know whether it can map
Since the usages of a Let's think about the reverse scenario of WebNN -> WebGPU mapping: // Inform WebNN to map/copy `mlBuffer` to `gpuBuffer` once
// `mlBuffer`'s contents are ready to be accessed
const gpuBuffer = wgpuDevice.importExternalBuffer(mlBuffer); How can the user agent know whether it can map To import a
This does not seem feasible - especially if we expect the MLBuffer's memory to be allocated on buffer creation. For example, what's the implied usage here? mlContext.dispatch(
graph,
/*inputs=*/{buffer: someMlBuffer},
/*outputs=*/{buffer: someMlBuffer},
); Edge cases aside, let's look at an example of chained inference - the other use case for const inputMlBuffer = mlContext.createBuffer({inputSize});
const intermediateMlBuffer = mlContext.createBuffer({intermediateSize});
const outputMlBuffer = mlContext.createBuffer({outputSize});
mlContext.writeBuffer(
inputMlBuffer,
/*dstOffset=*/0,
/*srcData=*/someJsArrayBuffer,
);
mlContext.dispatch(
graph,
/*inputs=*/{buffer: inputMlBuffer},
/*outputs=*/{buffer: intermediateMlBuffer},
);
// Feed the output of one execution as the input to the next. Chained inference!
mlContext.dispatch(
graph,
/*inputs=*/{buffer: intermediateMlBuffer},
/*outputs=*/{buffer: outputMlBuffer},
);
const resultBuffer = await mlContext.readBuffer(outputMlBuffer); Seems great! Now, where exactly will these buffers be allocated? This snippet from the WebGPU explainer gives us a hint that we can't both (1) allocate on creation and (2) not know the usage upfront - at least, not without sacrificing something (e.g. performance, extra copies):
To make this concrete - the Chromium prototype's DML implementation allocates memory for an This proposal doesn't use the words "mapping", but what's being proposing here is effectively mapping for
It seems clear to me that we need to define usage of With regards to (2), let's take the real-time video processing use case as another example. Using const applyEffectToFrame = () => {
// Get the frame data as a GPU buffer
// Some way to import directly into an MLBuffer directly would avoid this step
const gpuExternalBuffer = device.importExternalBuffer({source: video});
// Get the frame data into WebNN. The imported buffer is read-only, so this should
// hopefully not require a copy if `mlContext` tied to the same GPU as `gpuExternalBuffer`
const inputMlBuffer = mlContext.importExternalBuffer(gpuExternalBuffer);
const outputMlBuffer = mlContext.createBuffer({size: inputMlBuffer.size});
// Perform some effects described by `graph` on the frame (e.g. background blur)
const inputs = {buffer: inputMlBuffer};
const outputs = {buffer: outputMlBuffer};
mlContext.dispatch(graph, inputs, outputs);
// Inform WebNN to map/copy `outputMlBuffer` - which contains the resulting
// frame after effects have been applied - to `gpuBufferToRender` once
// `outputMlBuffer`'s contents are ready to be accessed
//
// To avoid a copy, `outputMlBuffer`'s contents must be guaranteed not to change
const gpuBufferToRender = wgpuDevice.importExternalBuffer(outputMlBuffer);
// create a bind group for `gpuBufferToRender`, create a command encoder, etc.
// asking WebGPU to render `gpuBufferToRender`
// ...
// These queued commands must block on completion of the `dispatch()` call above
wgpuDevice.queue.submit([commandEncoder.finish()]);
// Call this method for each frame
video.requestVideoFrameCallback(applyEffectToFrame);
} Without any additional synchronization, the commands submitted to the
My understanding is that #264 was attempting to specify the latter by describing WebNN execution in a
My (limited) understanding of WebGPU's relationship with video frames is that the former behavior does not exist? Consider a basic rendering loop with WebGPU: const render = () => {
// Get the frame data as a GPU buffer
const gpuExternalBuffer = device.importExternalBuffer({source: video});
// create a bind group for `gpuExternalBuffer`, create a command encoder,
// beginRenderPass, etc
// ...
// Queue a bunch of commands to the GPUQueue, which will eventually render to
// a WebGPU canvas
wgpuDevice.queue.submit([commandEncoder.finish()]);
// This method registers a callback which will be fired once a new frame is
// sent to the compositor
video.requestVideoFrameCallback(render);
} The encoded GPU commands will eventually "update the rendering of a WebGPU canvas", which in turn calls these steps in the HTML spec, which in turn (eventually) runs the animation frame or video request frame callbacks... which triggers the I think we need more details as to how WebGPU and WebNN synchronization will work :) |
I had a couple thoughts while looking through @a-sully's comment:
|
I'm thinking about scenarions where explicit memory management is needed. Say for example, a developer wants to ensure the GPU memory used by WebNN has been deallocated before allocating another chunk of memory on GPU (e.g. call WebGPU/WebGL immediately after they finish WebNN operations). My question is that whether WebNN need to provide an explicit synchronization point mechanism to the developer. Or do we expect GPU service / GPU drivers to handle this transparently? Could "queue for webnn memory for deallocation, allocate webgpu memory (could this OOM?), webnn memory deallocated" happen? |
@a-sully Appreciate the questions and help, responses below. @a-sully wrote
Good point to clarify. I think it's easiest to spec MLBuffer to have both input and output usage at creation. This is equivalent to GPUBuffer's resource usage bits (storage | storage_read), which gets re-created from MLBuffer's resource upon @a-sully wrote
A MLBuffer could be transfered/imported as-is without a copy if the User Agent determines the MLContext and GPUDevice supports zero-copy (ex. same adapter). @a-sully wrote
The MLBuffer allocates its "default" device-resource persistently from the device/context used to create it. The exact location is opaque to the WebNN developer since its runtime managed. Similarly, "upload" and "readback" heaps are allocated/managed from MLContext upon
In order to transfer MLBuffer, MLContext must be flushed prior, which occurs on [1] https://chromium-review.googlesource.com/c/chromium/src/+/5101781 Bryan |
Just to clarify, are you suggesting the synchronization strategy is to pause work from one API while the other API is using the buffer? e.g. WebNN must pause all work (from the What about the reverse? This is where WebNN is different from the video import example, as far as I can tell. Do we expect WebGPU to block all execution while a buffer is rented out to WebNN? This is related to the point I was trying to make in this comment (though in that case WebNN is renting to WebGPU):
The user agent needs to know whether |
Yup.
WebGPU doesn't rent-out |
@reillyeon thanks for the comments. @reillyeon wrote
WebGPU has a concept of CPU-accessible device memory (eg. staging buffer) which relies on |
@wacky6 appreciate the comments, responses below @wacky6 wrote
If the web developer forgets to synchronize @wacky6 wrote
Yup. I would expect WebNN, like WebGPU, to manage resources on the web developer's behalf (or by the GPU service). @wacky6 wrote
Yes, it could. WebNN memory would get deallocated, say upon calling |
+1 to @bbernhar's reply. I think that having an explicit transfer (via import APIs) between WebGPU and WebNN should be enough to satisfy our requirements. What we spec should clearly state that only one API should have access to the buffer at a time. While an MLBuffer is transferred to WebGPU, queueing new work to it via WebNN should be an error. Similarly, while an MLBuffer has been transferred back to WebNN, queuing new work to it via WebGPU should be an error. Note that for scenarios where developers want to source input from 2D raster-type data (image elements, media, camera, canvas elements, image bitmaps, etc) there will already need to be a conversion (tensorization) step from the raster image to a buffer for ingestion to WebNN. You can't really "zero copy optimize" this transfer operation. The web developer must tell the browser (via a WebGPU compute shader) how they'd like the raster image data to be arrange in the buffer. The same thing is true if you want to visualize the result of the ML operation via a 2D image. You'll need to import the buffer to WebGPU so that you can convert it to a
In the current proposal, MLBuffer destruction happens when you call the destroy method on the object. WebGPU has a similar setup where GPUBuffer destruction is done via a destroy method on GPUBuffer instead of queuing such an operation. See buffer destruction. Once you destroy a buffer, you can not queue new operations with it. Any inflight operations using the buffer complete before storage for the buffer is released. The browser is responsible for ensuring everything happens in a defined manner with no crashes or use-after-frees. Similar to how video textures work in WebGPU, calling
As @bbernhar pointed out, WebGPU already has a writeBuffer API which has the same parameters as this proposal. For WebNN,
The only time the JS thread will need to wait is when you call |
Thank you for your patience while I learn enough about the WebGPU primitives here to be able to appreciate the complexity behind this proposal. Overall I think this is the right direction. At this point my feedback is mainly editorial. I'd like to see as much symmetry with the WebGPU APIs as possible, which mainly means removing the I think the semantics discussed above around serializing WebNN and WebGPU operations which touch the same buffer make sense. To actually specify them I think we need to update the specification to be significantly more specific about timelines and introduce the pattern common to the WebGPU specification of providing "content timeline" and "device timeline" steps. Something I don't think the WebGPU specification has had to deal with yet is the potential for multiple interacting device timelines, as we would see in the case of WebGPU interacting with a non-GPU Since the |
Thank you @bbernhar, @RafaelCintron and many others for the proposal and deep thought on this issue. As I mentioned in the WG call this morning, although conceptually this proposal could work with any device buffer, a currently motivated use case is the WebNN/WebGPU interop scenario. The previous proposal (i.e. |
@reillyeon Thanks for the feedback. Overall, the comments make sense to me; in particular, the lack of @reillyeon wrote
WebGPU's buffer mapping relies on |
Adds support to upload or read back data to/from MLBuffer. Since MLContext determines the device-execution order of GPU operations, writeBuffer and readBuffer were added to MLContext. * Only full MLBuffer read/write from renderer are enabled. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: Id95da35e3f81bed47a356f76b75c043cdd500beb Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to upload or read back data to/from MLBuffer. Since MLContext determines the device-execution order of GPU operations, writeBuffer and readBuffer were added to MLContext. * Only full MLBuffer read/write from renderer are enabled. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: Id95da35e3f81bed47a356f76b75c043cdd500beb Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to upload or read back data to/from MLBuffer. Since MLContext determines the device-execution order of GPU operations, writeBuffer and readBuffer were added to MLContext. * Only full MLBuffer read/write from renderer are enabled. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: Id95da35e3f81bed47a356f76b75c043cdd500beb Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to upload or read back data to/from MLBuffer. Since MLContext determines the device-execution order of GPU operations, writeBuffer and readBuffer were added to MLContext. * Only full MLBuffer read/write from renderer are enabled. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: Id95da35e3f81bed47a356f76b75c043cdd500beb Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to upload or read back data to/from MLBuffer. Since MLContext determines the device-execution order of GPU operations, writeBuffer and readBuffer were added to MLContext. * Only full MLBuffer read/write from renderer are enabled. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: Id95da35e3f81bed47a356f76b75c043cdd500beb Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Introduces MLBuffer to the WebNN service. Since MLBuffer exists in the domain of the context, instead of a graph, a dedicated command recorder was needed to ensure future context operations (ie. readBuffer) get applied prior to graph operations. * Defines handles to identify objects in the ML service. * Defines MLBuffer interfaces. * Implements buffer creation and destruction. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I852eff452346a968812e9f248fbb0a4cfc917dbc Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5173676 Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Rafael Cintron <[email protected]> Commit-Queue: Bryan Bernhart <[email protected]> Cr-Commit-Position: refs/heads/main@{#1270252}
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5173678 Reviewed-by: Austin Sullivan <[email protected]> Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Reviewed-by: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1296971}
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5173678 Reviewed-by: Austin Sullivan <[email protected]> Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Reviewed-by: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1296971}
Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5173678 Reviewed-by: Austin Sullivan <[email protected]> Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Reviewed-by: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1296971}
…stonly Automatic update from web-platform-tests WebNN: Implement MLBuffer dispatch Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5173678 Reviewed-by: Austin Sullivan <[email protected]> Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Reviewed-by: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1296971} -- wpt-commits: eca5b25afd9d2753941aa623fc75a6635f06391f wpt-pr: 45632
…stonly Automatic update from web-platform-tests WebNN: Implement MLBuffer dispatch Adds support to execute MLGraphs using MLBuffers. Allows the WebNN developer to directly bind MLBuffers as input/outputs to graphs for execution, which keeps MLBuffer data on-device after execution completes. In future CLs, dispatch can be further optimized. * Moves out per graph resources required by Dispatch(). * MLGraphBuilder.build no longer pre-allocates I/O. webmachinelearning/webnn#482 Bug: 1472888 Change-Id: I7400704cf60c149c47c20f22c50d5f12bff89cf9 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5173678 Reviewed-by: Austin Sullivan <[email protected]> Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Reviewed-by: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1296971} -- wpt-commits: eca5b25afd9d2753941aa623fc75a6635f06391f wpt-pr: 45632
Closing, this issue has been replaced by smaller sub-issues which I encourage we use for discussion instead. |
This issue proposes a new opaque device-specific storage type in WebNN,
MLBuffer
.MLBuffer
is a backend-agnostic storage type (CPU, GPU, NPU, etc) which can be used in WebNN operations.MLBuffer
would be the solution to:Construction/Destruction
MLBuffer
is always known (and linear access is assumed).Upload/Download tensor data
srcData
is always made and returns control back to the web developer immediately.Binding to graphs
outputs
assumes output usage).Edits:
dispatch
instead of overloadingcompute()
per https://www.w3.org/2023/12/14-webmachinelearning-minutes.htmlThe text was updated successfully, but these errors were encountered: