-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500
Comments
Zero-initializing workgroup memory works around a Metal driver bug that can cause the tile_alloc and coarse stages to freeze up the system when running natively on Metal due to their use of `workgroupUniformLoad`. See gfx-rs/naga#2482 for more details about this bug. Zero-initializing workgroup memory also makes the native Metal shaders match the behavior of the current wgpu runner and the invariants of the original WGSL that the shaders were authored in.
Zero-initializing workgroup memory works around a Metal driver bug that can cause the tile_alloc and coarse stages to freeze up the system when running natively on Metal due to their use of `workgroupUniformLoad`. See gfx-rs/naga#2482 for more details about this bug. Zero-initializing workgroup memory also makes the native Metal shaders match the behavior of the current wgpu runner and the invariants of the original WGSL that the shaders were authored in.
Zero-initializing workgroup memory works around a Metal driver bug that can cause the tile_alloc and coarse stages to freeze up the system when running natively on Metal due to their use of `workgroupUniformLoad`. See gfx-rs/naga#2482 for more details about this bug. Zero-initializing workgroup memory also makes the native Metal shaders match the behavior of the current wgpu runner and the invariants of the original WGSL that the shaders were authored in.
Hi there! I promised an update a while ago but I didn't get a chance to update the issue. I narrowed down the miscompilation to the dynamic workgroup memory declaration itself: the bug in wg-mem-as-arg.metal goes away if I change the argument declaration from a reference (i.e. This seems to reliably fix the bug. This also happens to be what Tint does in certain cases where it generates a workgroup variable as an entry-point parameter to work around a different compiler issue (see crbug.com/tint/938). My proposal for Naga is to do the same thing and move from a reference to a pointer. |
Thanks for further looking into this! I checked the metal spec again and it doesn't mention "references" anywhere in its section about the
Sounds good! A PR would be welcome if you have time! |
I am also running into issues with However, removing those invocations means that same shader will not compile in the browser. Since having to write platform specific shader code sort of defeats the purpose of using a cross-platform gfx API in the first place, I would appreciate a fix! |
I believe I have identified a bug in the Metal compiler that violates the expected memory access and barrier ordering when a compute kernel uses dynamic threadgroup memory (i.e. when threadgroup shared memory is declared as an entry-point parameter rather than a fixed-size function-local variable). I have a very reliable reproduction (in Metal) of this bug implemented and detailed at https://github.com/armansito/metal-workgroup-memory-bug. A reproduction of the bug
using wgpu can be found here: https://github.com/armansito/naga-workgroup-memory-bug.
I ran into this while debugging a system-wide freeze due to one of the shaders in linebender/vello that uses the result of
workgroupUniformLoad
from shared memory to exit a compute stage. The miscompilation causes the load to be non-uniform which violates an important invariant. The bug is reproducible using wgpu but not when the same shaders are executed by Dawn. I believe this is because Naga implements workgroup buffers as dynamic (i.e. an entry-point parameter) while Tint does not. I think this is also possibly related to #3181.A WGSL version of the Metal program from my test case looks like this:
The test case sets up
flag
to always have a value of 0. The expected behavior is for all threads in the workgroup to add 1 tooutput
(which is what I see with Dawn/Tint). The observed behavior is that seemingly only one SIMD group increments the atomic in all workgroups.This is obviously not a bug in Naga but in the Metal driver but since it impacts correctness I propose that Naga take the same approach as Tint and convert workgroup variables to a function-local threadgroup variable instead of an entry-point parameter.
The text was updated successfully, but these errors were encountered: