[js/web] JSEP LayerNormalization and InstanceNormalizations kernels #16830

dakenf · 2023-07-24T11:41:07Z

Description

Added two kernels for Layer and Instance norm

Also added maximum limits for maxBufferSize when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights.

Motivation and Context

These two are used in StableDiffusion and many other networks

onnxruntime/core/providers/js/operators/layer_norm.cc

onnxruntime/core/providers/js/operators/instance_norm.h

onnxruntime/core/providers/js/operators/instance_norm.cc

onnxruntime/core/providers/js/operators/layer_norm.h

onnxruntime/core/providers/js/operators/instance_norm.h

satyajandhyala · 2023-07-24T20:10:20Z

Can we verify/enable some of these tests if not all?
https://github.com/microsoft/onnxruntime/blob/5d17bcd776f326506360187b48808c2c005dbc94/js/web/test/suite-test-list.jsonc#L609C1-L646C53

onnxruntime/core/providers/js/operators/instance_norm.h

onnxruntime/core/providers/js/operators/layer_norm.h

onnxruntime/core/providers/js/js_execution_provider.cc

dakenf · 2023-07-25T16:11:15Z

Can we verify/enable some of these tests if not all?

I've enabled all instance norm tests (there were just two) and layer norm except some "expanded" ones which don't actually test the OP and fail for some reason. They are: test_layer_normalization_3d_axis_negative_1_epsilon_expanded, test_layer_normalization_3d_axis_negative_2_epsilon_expanded, test_layer_normalization_3d_axis_negative_3_epsilon_expanded, test_layer_normalization_3d_axis0_epsilon_expanded, test_layer_normalization_3d_axis2_epsilon_expanded, test_layer_normalization_4d_axis_negative_2_expanded and test_layer_normalization_4d_axis2_expanded

dakenf · 2023-07-27T16:28:39Z

@fs-eire @satyajandhyala can you please advise me about a weird issue i get with LayerNorm?
In the spec and for tests it requires three outputs: result, mean data and inv std data. And it works fine.
But when i run it with StableDiffusion text encoder or unet it throws an error:

Writable storage buffer binding aliasing found between [BindGroup] set at bind group index 0, binding index 4, and [BindGroup] set at bind group index 0, binding index 5, with overlapping ranges (offset: 0, size: 1280) and (offset: 0, size: 1280) in [Buffer].
 - While encoding [ComputePassEncoder].DispatchWorkgroups(8, 1, 1).

So if i comment out the last binding it works perfectly fine but tests fail as they expect three outputs

@group(0) @binding(0) var<storage, read> x : array<${dataType}>;
  @group(0) @binding(1) var<storage, read> scale : array<${dataType}>;
  @group(0) @binding(2) var<storage, read> bias : array<${dataType}>;
  @group(0) @binding(3) var<storage, read_write> output : array<${dataType}>;
  @group(0) @binding(4) var<storage, read_write> meanDataOutput : array<${dataType}>;
  // COMMENT OUT THIS @group(0) @binding(5) var<storage, read_write> invStdOutput : array<${dataType}>;

  ${shaderHelper.mainStart(workgroupLimits)}
   ... shader code

    meanDataOutput[global_idx] = mean;
    invStdOutput[global_idx] = 1 / meanSquare;
  }`;
        return {
            ...metadata,
            outputs: [
                {dims: outputShape, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
                {dims: meanInvStdDevDim, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
                 // COMMENT OUT THIS {dims: meanInvStdDevDim, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
            ],
            getShaderSource,
            dispatchGroup: () => (dispatchGroup)
        };
    };

And i have another unrelated issue. After adding Gemm for opset 13 (currenty in JSEP it has a kernel only for 11 and falls back to CPU) i get this error:

[Buffer] usage (BufferUsage::(Storage|BufferUsage::80000000)) includes writable usage and another usage in the same synchronization scope.
 - While validating compute pass usage.

I guess it hits max buffer count/sizes. Is there an easy way to dispose unused buffers during the execution? As it uses more than 8gb of video memory. 3gb for UNET weights and allocates more than 5 during OrtRun.

fs-eire · 2023-07-28T04:34:23Z

The output count needs to match the actual number of the node's outputs in the graph. I am not sure if a context can get the outputs number, if not we need to add this.

satyajandhyala · 2023-07-28T13:55:50Z

@fs-eire @satyajandhyala can you please advise me about a weird issue i get with LayerNorm? In the spec and for tests it requires three outputs: result, mean data and inv std data. And it works fine. But when i run it with StableDiffusion text encoder or unet it throws an error:

Writable storage buffer binding aliasing found between [BindGroup] set at bind group index 0, binding index 4, and [BindGroup] set at bind group index 0, binding index 5, with overlapping ranges (offset: 0, size: 1280) and (offset: 0, size: 1280) in [Buffer].
 - While encoding [ComputePassEncoder].DispatchWorkgroups(8, 1, 1).

So if i comment out the last binding it works perfectly fine but tests fail as they expect three outputs

@group(0) @binding(0) var<storage, read> x : array<${dataType}>;
  @group(0) @binding(1) var<storage, read> scale : array<${dataType}>;
  @group(0) @binding(2) var<storage, read> bias : array<${dataType}>;
  @group(0) @binding(3) var<storage, read_write> output : array<${dataType}>;
  @group(0) @binding(4) var<storage, read_write> meanDataOutput : array<${dataType}>;
  // COMMENT OUT THIS @group(0) @binding(5) var<storage, read_write> invStdOutput : array<${dataType}>;

  ${shaderHelper.mainStart(workgroupLimits)}
   ... shader code

    meanDataOutput[global_idx] = mean;
    invStdOutput[global_idx] = 1 / meanSquare;
  }`;
        return {
            ...metadata,
            outputs: [
                {dims: outputShape, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
                {dims: meanInvStdDevDim, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
                 // COMMENT OUT THIS {dims: meanInvStdDevDim, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
            ],
            getShaderSource,
            dispatchGroup: () => (dispatchGroup)
        };
    };

And i have another unrelated issue. After adding Gemm for opset 13 (currenty in JSEP it has a kernel only for 11 and falls back to CPU) i get this error:

[Buffer] usage (BufferUsage::(Storage|BufferUsage::80000000)) includes writable usage and another usage in the same synchronization scope.
 - While validating compute pass usage.

I guess it hits max buffer count/sizes. Is there an easy way to dispose unused buffers during the execution? As it uses more than 8gb of video memory. 3gb for UNET weights and allocates more than 5 during OrtRun.

Does it help specifying the actual size, if possible, in addition to dataType for x, scale, bias, output, etc.?

dakenf · 2023-07-28T17:28:03Z

Does it help specifying the actual size, if possible, in addition to dataType for x, scale, bias, output, etc.?

The suggestion above helped. I've added OutputCount to serialized kernel context and it started to work fine

fs-eire · 2023-07-30T05:39:32Z

I have a question regarding the optional inputs/outputs of an operator. take LayerNormalization for example - is it possible that a graph contains an LayerNormalization node whose output[0] and output[2] exist, but output[1] is omitted? If this situation may happen, then defining 'outputCount' may not be sufficient (it'll be ambiguous because if outputCount==2, it may mean either output[0]+output[1] or output[0]+output[2])

dakenf · 2023-07-31T11:00:18Z

I have a question regarding the optional inputs/outputs of an operator. take LayerNormalization for example - is it possible that a graph contains an LayerNormalization node whose output[0] and output[2] exist, but output[1] is omitted? If this situation may happen, then defining 'outputCount' may not be sufficient (it'll be ambiguous because if outputCount==2, it may mean either output[0]+output[1] or output[0]+output[2])

From what I see in the code, if output[1] does not exist, node will have three outputs but for the second context->OutputType(1) will return nullptr. But I might be wrong. CPU execution provider does not have any checks, it always has three outputs.

Also, bias input for LayerNorm is optional, so i will add a check for it

guschmue · 2023-08-01T16:45:40Z

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-08-01T16:45:50Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2023-08-02T22:47:17Z

ok, some reviewers got added to your chromium PR - hope that works out.

dakenf · 2023-08-02T23:02:47Z

Pefect, thanks. If I will be able to fix memory.grow after 4gb then there will be no reason to use 32bit version with JSEP. Right now the only way to use more memory is to specify fixed size with INITIAL_MEMORY linking flag

guschmue · 2023-08-02T23:19:40Z

yes. I'm sure there will be some more complications on the way, but it will be needed now that webgpu is there and shows the possibilities to run things like sd or llm's in the browser.

guschmue · 2023-08-04T19:16:29Z

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-08-04T19:16:39Z

Azure Pipelines successfully started running 1 pipeline(s).

satyajandhyala · 2023-08-04T20:20:07Z

@dakenf
The latest change to the webgpu-operator.md document removed some comments existed previously. I am not sure whether that it intentional or not. We don't edit this file directly. It should be generated using npm run build:doc in onnxruntime/js/web folder. The comments get inserted by generate-webgpu-operator-md.ts. After making any necessary changes to this TypeScript file you need to run npm run prepare to compile/generate JavaScript file.

js/web/docs/webgpu-operators.md

satyajandhyala · 2023-08-04T20:26:32Z

@dakenf Some unresolved/valid comments are blocking merging this PR. Please address these comments so that this PR can be merged.

satyajandhyala · 2023-08-04T22:33:11Z

@dakenf Click on "view" link in "Unresolved conversations" to go to the unresolved comment directly.

dakenf · 2023-08-04T22:43:02Z

@dakenf Click on "view" link in "Unresolved conversations" to go to the unresolved comment directly.

I think it's just github interface lagging because i get this when i click on it

But the actual code is already fixed if you go here (with int32_t and axis=-1) https://github.com/microsoft/onnxruntime/pull/16830/files

Or am i looking in the wrong place?

satyajandhyala · 2023-08-05T00:03:57Z

@dakenf Thanks for clarifying. I marked the conversation resolved.

guschmue · 2023-08-07T19:27:52Z

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-08-07T19:28:02Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2023-08-07T19:31:52Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2023-08-07T19:32:28Z

Azure Pipelines successfully started running 9 pipeline(s).

guschmue · 2023-08-07T19:33:11Z

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2023-08-07T19:33:35Z

Azure Pipelines successfully started running 6 pipeline(s).

guschmue · 2023-08-07T23:50:01Z

going to take this for a quick test before I merge it

…16830) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks

…icrosoft#16830) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks

dakenf added 2 commits July 24, 2023 15:37

JSEP LayerNormalization and InstanceNormalizations OPs

6126397

Added missing tensorTypeToWsglType function

a0e5d46

github-advanced-security bot found potential problems Jul 24, 2023

View reviewed changes

CLang linter fixes

8cce842

satyajandhyala added the ep:WebGPU ort-web webgpu provider label Jul 24, 2023

satyajandhyala reviewed Jul 24, 2023

View reviewed changes

onnxruntime/core/providers/js/operators/layer_norm.h Outdated Show resolved Hide resolved

satyajandhyala reviewed Jul 24, 2023

View reviewed changes

onnxruntime/core/providers/js/operators/instance_norm.h Outdated Show resolved Hide resolved

fs-eire reviewed Jul 24, 2023

View reviewed changes

onnxruntime/core/providers/js/operators/instance_norm.h Show resolved Hide resolved

fs-eire reviewed Jul 24, 2023

View reviewed changes

onnxruntime/core/providers/js/operators/layer_norm.h Outdated Show resolved Hide resolved

fs-eire reviewed Jul 24, 2023

View reviewed changes

onnxruntime/core/providers/js/operators/layer_norm.h Show resolved Hide resolved

fs-eire reviewed Jul 24, 2023

View reviewed changes

onnxruntime/core/providers/js/js_execution_provider.cc Show resolved Hide resolved

Fixed code, enabled tests

cb9caa7

dakenf added 2 commits July 25, 2023 20:40

Enabled more tests

9ee3df9

Arbitrary shape size support for NHWC InstanceNorm

fed29fe

dakenf and others added 2 commits July 27, 2023 21:42

Merge branch 'main' into jsep-layer-instance-norm

2678a0c

Updated webgpu-operators.md

8e6cb13

Added outputCount to serialized kernel context

0d4b749

Merge branch 'main' into jsep-layer-instance-norm

277a2e5

satyajandhyala mentioned this pull request Jul 28, 2023

[Web] WebGPU supported operator tracking #15952

Closed

Code style fixes, optional bias input for LayerNorm

4a4d27b

dakenf and others added 2 commits August 4, 2023 23:03

Merge branch 'microsoft:main' into jsep-layer-instance-norm

10eabdc

webgpu-operators.md update

4929ee5

dakenf dismissed guschmue’s stale review via 4929ee5 August 4, 2023 19:05

satyajandhyala reviewed Aug 4, 2023

View reviewed changes

js/web/docs/webgpu-operators.md Outdated Show resolved Hide resolved

satyajandhyala reviewed Aug 4, 2023

View reviewed changes

js/web/docs/webgpu-operators.md Outdated Show resolved Hide resolved

webgpu-operators.md update

3468f62

satyajandhyala approved these changes Aug 5, 2023

View reviewed changes

guschmue approved these changes Aug 7, 2023

View reviewed changes

guschmue merged commit c3f0425 into microsoft:main Aug 8, 2023

dakenf mentioned this pull request Aug 14, 2023

[JS/Web] Added SkipLayerNormalization operator. #17102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[js/web] JSEP LayerNormalization and InstanceNormalizations kernels #16830

[js/web] JSEP LayerNormalization and InstanceNormalizations kernels #16830

dakenf commented Jul 24, 2023

satyajandhyala commented Jul 24, 2023

dakenf commented Jul 25, 2023 •

edited

Loading

dakenf commented Jul 27, 2023

fs-eire commented Jul 28, 2023

satyajandhyala commented Jul 28, 2023 •

edited

Loading

dakenf commented Jul 28, 2023

fs-eire commented Jul 30, 2023

dakenf commented Jul 31, 2023

guschmue commented Aug 1, 2023

azure-pipelines bot commented Aug 1, 2023

guschmue commented Aug 2, 2023

dakenf commented Aug 2, 2023

guschmue commented Aug 2, 2023

guschmue commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

satyajandhyala commented Aug 4, 2023

satyajandhyala commented Aug 4, 2023

satyajandhyala commented Aug 4, 2023

dakenf commented Aug 4, 2023

satyajandhyala commented Aug 5, 2023

guschmue commented Aug 7, 2023

azure-pipelines bot commented Aug 7, 2023

guschmue commented Aug 7, 2023

azure-pipelines bot commented Aug 7, 2023

guschmue commented Aug 7, 2023

azure-pipelines bot commented Aug 7, 2023

guschmue commented Aug 7, 2023

[js/web] JSEP LayerNormalization and InstanceNormalizations kernels #16830

[js/web] JSEP LayerNormalization and InstanceNormalizations kernels #16830

Conversation

dakenf commented Jul 24, 2023

Description

Motivation and Context

satyajandhyala commented Jul 24, 2023

dakenf commented Jul 25, 2023 • edited Loading

dakenf commented Jul 27, 2023

fs-eire commented Jul 28, 2023

satyajandhyala commented Jul 28, 2023 • edited Loading

dakenf commented Jul 28, 2023

fs-eire commented Jul 30, 2023

dakenf commented Jul 31, 2023

guschmue commented Aug 1, 2023

azure-pipelines bot commented Aug 1, 2023

guschmue commented Aug 2, 2023

dakenf commented Aug 2, 2023

guschmue commented Aug 2, 2023

guschmue commented Aug 4, 2023

azure-pipelines bot commented Aug 4, 2023

satyajandhyala commented Aug 4, 2023

satyajandhyala commented Aug 4, 2023

satyajandhyala commented Aug 4, 2023

dakenf commented Aug 4, 2023

satyajandhyala commented Aug 5, 2023

guschmue commented Aug 7, 2023

azure-pipelines bot commented Aug 7, 2023

guschmue commented Aug 7, 2023

azure-pipelines bot commented Aug 7, 2023

guschmue commented Aug 7, 2023

azure-pipelines bot commented Aug 7, 2023

guschmue commented Aug 7, 2023

dakenf commented Jul 25, 2023 •

edited

Loading

satyajandhyala commented Jul 28, 2023 •

edited

Loading