-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[js/web] JSEP LayerNormalization and InstanceNormalizations kernels #16830
Conversation
Can we verify/enable some of these tests if not all? |
I've enabled all instance norm tests (there were just two) and layer norm except some "expanded" ones which don't actually test the OP and fail for some reason. They are: test_layer_normalization_3d_axis_negative_1_epsilon_expanded, test_layer_normalization_3d_axis_negative_2_epsilon_expanded, test_layer_normalization_3d_axis_negative_3_epsilon_expanded, test_layer_normalization_3d_axis0_epsilon_expanded, test_layer_normalization_3d_axis2_epsilon_expanded, test_layer_normalization_4d_axis_negative_2_expanded and test_layer_normalization_4d_axis2_expanded |
@fs-eire @satyajandhyala can you please advise me about a weird issue i get with LayerNorm?
So if i comment out the last binding it works perfectly fine but tests fail as they expect three outputs @group(0) @binding(0) var<storage, read> x : array<${dataType}>;
@group(0) @binding(1) var<storage, read> scale : array<${dataType}>;
@group(0) @binding(2) var<storage, read> bias : array<${dataType}>;
@group(0) @binding(3) var<storage, read_write> output : array<${dataType}>;
@group(0) @binding(4) var<storage, read_write> meanDataOutput : array<${dataType}>;
// COMMENT OUT THIS @group(0) @binding(5) var<storage, read_write> invStdOutput : array<${dataType}>;
${shaderHelper.mainStart(workgroupLimits)}
... shader code
meanDataOutput[global_idx] = mean;
invStdOutput[global_idx] = 1 / meanSquare;
}`;
return {
...metadata,
outputs: [
{dims: outputShape, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
{dims: meanInvStdDevDim, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
// COMMENT OUT THIS {dims: meanInvStdDevDim, dataType: inputs[0].dataType, gpuDataType: GpuDataType.default},
],
getShaderSource,
dispatchGroup: () => (dispatchGroup)
};
}; And i have another unrelated issue. After adding Gemm for opset 13 (currenty in JSEP it has a kernel only for 11 and falls back to CPU) i get this error:
I guess it hits max buffer count/sizes. Is there an easy way to dispose unused buffers during the execution? As it uses more than 8gb of video memory. 3gb for UNET weights and allocates more than 5 during OrtRun. |
The output count needs to match the actual number of the node's outputs in the graph. I am not sure if a |
Does it help specifying the actual size, if possible, in addition to dataType for x, scale, bias, output, etc.? |
The suggestion above helped. I've added OutputCount to serialized kernel context and it started to work fine |
I have a question regarding the optional inputs/outputs of an operator. take |
From what I see in the code, if output[1] does not exist, node will have three outputs but for the second Also, |
/azp run ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
ok, some reviewers got added to your chromium PR - hope that works out. |
Pefect, thanks. If I will be able to fix memory.grow after 4gb then there will be no reason to use 32bit version with JSEP. Right now the only way to use more memory is to specify fixed size with INITIAL_MEMORY linking flag |
yes. I'm sure there will be some more complications on the way, but it will be needed now that webgpu is there and shows the possibilities to run things like sd or llm's in the browser. |
/azp run ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
@dakenf |
@dakenf Some unresolved/valid comments are blocking merging this PR. Please address these comments so that this PR can be merged. |
@dakenf Click on "view" link in "Unresolved conversations" to go to the unresolved comment directly. |
I think it's just github interface lagging because i get this when i click on it |
@dakenf Thanks for clarifying. I marked the conversation resolved. |
/azp run ONNX Runtime Web CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 6 pipeline(s). |
going to take this for a quick test before I merge it |
…16830) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks
…icrosoft#16830) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks
…icrosoft#16830) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks
Description
Added two kernels for Layer and Instance norm
Also added maximum limits for
maxBufferSize
when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights.Motivation and Context
These two are used in StableDiffusion and many other networks