Should validate MLGraph.[[context]] in MLContext.compute() and MLContext.computeSync() steps #341

huningxin · 2023-02-13T07:25:35Z

This issue was raised in Chromium WebNN CL review by @wacky6 , thanks Jiewei!

MLGraphBuilder takes a context, why do we define compute() on MLContext instead
of the built graph?

The change of moving compute methods from MLGraph to MLContext was introduced by #257 . As @wchao1115 mentioned, the main intention is to avoid adding more compute methods for different execution modes, such as queued execution mode of MLCommandEncoder for WebGPU interop, into the single MLGraph interface. The design choice was made to fold the execution methods into MLContext and MLCommandEncoder interfaces, hopefully "making it easier for developer to associate execution method to the context created with different options."

This allows passing a MLGraph build for a different context. This sounds like the spec is allowing misuse? I'd imagine a CPU graph can't be used with GPU context.

This is not the intention. This looks like an issue of WebNN. The current steps of Synchronous Execution and Asynchronous Execution should validate the MLGraph.[[context]] against the MLContext instance

The text was updated successfully, but these errors were encountered:

wacky6 · 2023-02-13T23:53:55Z

Checking context match is one option. Though ideally the API design should avoid getting into such failure cases.

AFAIK, a built graph should have been initialized (i.e. a built GPU graph should have initialized GPU resources to avoid cold start in first compute(), or have JIT compilation completed). So the graph is strongly tied with context and can't really be used for a different context.

Or alternatively, we make MLGraph a context independent concept, and introduce a new MLBuiltGraph concept that represents a graph initialized on device and can immediately start computing().

MLGraph (context independent) + MLContext (resources) -> MLBuiltGraph (can do commpute).

Though in this case, MLGraph becomes a "Web standardized" way to describe a ML model, the Context + MLBuiltGraph effectively becomes what we have in ML Model Loader.

huningxin · 2023-02-14T04:46:47Z

@wacky6

AFAIK, a built graph should have been initialized (i.e. a built GPU graph should have initialized GPU resources to avoid cold start in first compute(), or have JIT compilation completed). So the graph is strongly tied with context and can't really be used for a different context.

I'd agree with you. The initialization should be context dependent and it would be critical for optimal executions.

According to WebNN programming model, the MLGraph actually represents a compiled graph for optimal execution. The computation graph compilation, optimization and initialization would be context dependent.

For default MLContext, the build methods (build() or buildSync()) of MLGraphBuilder would compile and optimize the computation graph. For example, the Chromium WebNN XNNPACK backend calls xnn_create_runtime() for MLGraphBuilder.build() implementation. XNNPACK internally would do subgraph optimizations (such as op fusion), JIT compilation (if it is enabled) and weights repacking (e.g. for conv2d) etc.,. After this step, the MLGraph XNNPACK implementation (MLGraphXnnpack) would use the optimized xnn_runtime object for the optimal executions by calling xnn_invoke_runtime().

For MLCommandEncoder, its initializeGraph() method for this purpose and focuses on WebGPU queue submission. The implementation experience is to be established for this interface.

So, I guess the MLGraph interface is close to MLBuiltGraph concept in your comment. In current WebNN spec, there is no interface to represent the context independent computation graph. Are you proposing we should introduce such interface?

wacky6 · 2023-02-14T07:22:26Z

I guess the key issue is to provide a task description (for GPU) so that the client can decide when the task is executed along with other non-WebNN tasks.

From the current spec, it's unclear where GPU related states are stored (are they on MLCommandEncoder, or MLGraph itself).

I feel some example code would help with the discussion (I'm not entirely familiar with WebGPU usage): what's the difference between CPU MLGraph.compute() vs. GPU MLGraph.compute() vs. GPU command encoder + WebGPU interop?

I'd imagine GPU MLGraph.compute() just do things under the hood, and WebGPU interop provides fine-grain control.

Maybe we should subclass MLGraph based on the context that creates it. For example, CPU context returns a MLCpuGraph with compute(). GPU context returns a MLGpuGraph with compute() and GPU interop methods (commandBuffer, dispatch, etc).

We probably should also see if prior arts exist (like Canvas's rendering context)?

huningxin · 2023-02-15T05:55:24Z

@wacky6 , thanks for your suggestions and proposal. I am filing new issues that we can follow up respectively.

From the current spec, it's unclear where GPU related states are stored (are they on MLCommandEncoder, or MLGraph itself).

#342

I feel some example code would help with the discussion (I'm not entirely familiar with WebGPU usage): what's the difference between CPU MLGraph.compute() vs. GPU MLGraph.compute() vs. GPU command encoder + WebGPU interop?

#343

Maybe we should subclass MLGraph based on the context that creates it. For example, CPU context returns a MLCpuGraph with compute(). GPU context returns a MLGpuGraph with compute() and GPU interop methods (commandBuffer, dispatch, etc).

#344

anssiko · 2023-02-15T13:01:30Z

If we add MLGraph.[[context]] validation into compute() and computeSync() we should see if that suggests changes to the validate MLContext algorithm that is called from createContext(), createContextSync() method, and MLGraphBuilder constructor. We hit that "validate MLContext" algorithm before we get to compute() or computeSync().

I think we also want to review that all the validation checks in place are necessary, no duplicates and that we bail out as soon as possible as an optimization strategy.

We can always rename these internal algorithms and that is encouraged if it improves spec readability. More specific names are generally better than very generic names.

inexorabletash · 2024-02-21T19:43:07Z

Unless #303 results in a big reorganization of the API, I think this issue boils down to what it says in the title. Basically, add this to the compute() method steps:

If graph.[[context]] is not this, then reject promise with a "DataError" DOMException.

The Chromium prototype implementation has this check. Shall we just add it and close out this issue, and let the other issues track broader API shape considerations?

anssiko · 2024-02-26T14:13:20Z

@inexorabletash that line added to the compute() method steps would be the missing piece to close this issue, I think.

(The spec has evolved since this was initially discussed. E.g. *Sync() methods, the dedicated MLContext validation algorithm have been removed and relevant validation steps inlined into the method steps, thus my earlier comments have been addressed already.)

This matches the Chromium prototype implementation. Fixes webmachinelearning#341

This matches the Chromium prototype implementation. Fixes #341

This was referenced Feb 15, 2023

Clarify where GPU related states are stored for WebGPU interop (MLContext, MLCommandEncoder and MLGraph) #342

Closed

Add sample code for WebGPU interop #343

Open

Subclass MLGraph based on the context that creates it #344

Closed

anssiko added the question label Mar 3, 2023

a-sully mentioned this issue Feb 4, 2024

Decide how to specify graph initialization #552

Closed

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Feb 26, 2024

Bug fix: Validate MLGraph's context in MLContext's compute()

55ebb8a

This matches the Chromium prototype implementation. Fixes webmachinelearning#341

inexorabletash self-assigned this Feb 26, 2024

inexorabletash mentioned this issue Feb 26, 2024

Bug fix: Validate MLGraph's context in MLContext's compute() #584

Merged

anssiko closed this as completed in #584 Feb 27, 2024

anssiko pushed a commit that referenced this issue Feb 27, 2024

Bug fix: Validate MLGraph's context in MLContext's compute() (#584)

a904425

This matches the Chromium prototype implementation. Fixes #341

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should validate MLGraph.[[context]] in MLContext.compute() and MLContext.computeSync() steps #341

Should validate MLGraph.[[context]] in MLContext.compute() and MLContext.computeSync() steps #341

huningxin commented Feb 13, 2023 •

edited

Loading

wacky6 commented Feb 13, 2023

huningxin commented Feb 14, 2023

wacky6 commented Feb 14, 2023

huningxin commented Feb 15, 2023

anssiko commented Feb 15, 2023

inexorabletash commented Feb 21, 2024 •

edited

Loading

anssiko commented Feb 26, 2024

Should validate MLGraph.[[context]] in MLContext.compute() and MLContext.computeSync() steps #341

Should validate MLGraph.[[context]] in MLContext.compute() and MLContext.computeSync() steps #341

Comments

huningxin commented Feb 13, 2023 • edited Loading

wacky6 commented Feb 13, 2023

huningxin commented Feb 14, 2023

wacky6 commented Feb 14, 2023

huningxin commented Feb 15, 2023

anssiko commented Feb 15, 2023

inexorabletash commented Feb 21, 2024 • edited Loading

anssiko commented Feb 26, 2024

huningxin commented Feb 13, 2023 •

edited

Loading

inexorabletash commented Feb 21, 2024 •

edited

Loading