Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

Open
armansito opened this issue Sep 15, 2023 · 3 comments
Labels
area: naga back-end Outputs of naga shader conversion external: driver-bug A driver is causing the bug, though we may still want to work around it lang: Metal Metal Shading Language naga Shader Translator type: bug Something isn't working

Comments

@armansito
Copy link
Contributor

armansito commented Sep 15, 2023

I believe I have identified a bug in the Metal compiler that violates the expected memory access and barrier ordering when a compute kernel uses dynamic threadgroup memory (i.e. when threadgroup shared memory is declared as an entry-point parameter rather than a fixed-size function-local variable). I have a very reliable reproduction (in Metal) of this bug implemented and detailed at https://github.com/armansito/metal-workgroup-memory-bug. A reproduction of the bug
using wgpu can be found here: https://github.com/armansito/naga-workgroup-memory-bug.

I ran into this while debugging a system-wide freeze due to one of the shaders in linebender/vello that uses the result of workgroupUniformLoad from shared memory to exit a compute stage. The miscompilation causes the load to be non-uniform which violates an important invariant. The bug is reproducible using wgpu but not when the same shaders are executed by Dawn. I believe this is because Naga implements workgroup buffers as dynamic (i.e. an entry-point parameter) while Tint does not. I think this is also possibly related to #3181.

A WGSL version of the Metal program from my test case looks like this:

@group(0) @binding(0) var<uniform> flag: u32;
@group(0) @binding(1) var<storage, read_write> output: atomic<u32>;

var<workgroup> shared_flag: u32;

@compute @workgroup_size(64)
fn main(@builtin(local_invocation_id) local_id: vec3<u32>) {
    if local_id.x == 0u {
        shared_flag = 0xffffffffu;
    }
    workgroupBarrier();

    if local_id.x == 0u {
        shared_flag = flag;
    }

    let abort = workgroupUniformLoad(&shared_flag);
    if abort != 0u {
        return;
    }

    atomicAdd(&output, 1u);
}

The test case sets up flag to always have a value of 0. The expected behavior is for all threads in the workgroup to add 1 to output (which is what I see with Dawn/Tint). The observed behavior is that seemingly only one SIMD group increments the atomic in all workgroups.

This is obviously not a bug in Naga but in the Metal driver but since it impacts correctness I propose that Naga take the same approach as Tint and convert workgroup variables to a function-local threadgroup variable instead of an entry-point parameter.

armansito referenced this issue in armansito/vello Sep 19, 2023
Zero-initializing workgroup memory works around a Metal driver bug
that can cause the tile_alloc and coarse stages to freeze up the system
when running natively on Metal due to their use of `workgroupUniformLoad`.
See gfx-rs/naga#2482 for more details about this bug.

Zero-initializing workgroup memory also makes the native Metal shaders
match the behavior of the current wgpu runner and the invariants of the
original WGSL that the shaders were authored in.
armansito referenced this issue in armansito/vello Sep 19, 2023
Zero-initializing workgroup memory works around a Metal driver bug
that can cause the tile_alloc and coarse stages to freeze up the system
when running natively on Metal due to their use of `workgroupUniformLoad`.
See gfx-rs/naga#2482 for more details about this bug.

Zero-initializing workgroup memory also makes the native Metal shaders
match the behavior of the current wgpu runner and the invariants of the
original WGSL that the shaders were authored in.
@teoxoy teoxoy added kind: bug lang: Metal Metal Shading Language area: naga back-end Outputs of naga shader conversion labels Sep 21, 2023
armansito referenced this issue in armansito/vello Sep 22, 2023
Zero-initializing workgroup memory works around a Metal driver bug
that can cause the tile_alloc and coarse stages to freeze up the system
when running natively on Metal due to their use of `workgroupUniformLoad`.
See gfx-rs/naga#2482 for more details about this bug.

Zero-initializing workgroup memory also makes the native Metal shaders
match the behavior of the current wgpu runner and the invariants of the
original WGSL that the shaders were authored in.
@armansito
Copy link
Contributor Author

armansito commented Oct 16, 2023

Hi there! I promised an update a while ago but I didn't get a chance to update the issue. I narrowed down the miscompilation to the dynamic workgroup memory declaration itself: the bug in wg-mem-as-arg.metal goes away if I change the argument declaration from a reference (i.e. threadgroup uint& shared_flag [[threadgroup(0)]]) to a pointer (i.e. threadgroup uint* shared_flag [[threadgroup(0)]]).

This seems to reliably fix the bug. This also happens to be what Tint does in certain cases where it generates a workgroup variable as an entry-point parameter to work around a different compiler issue (see crbug.com/tint/938). My proposal for Naga is to do the same thing and move from a reference to a pointer.

@teoxoy
Copy link
Member

teoxoy commented Oct 16, 2023

Thanks for further looking into this!

I checked the metal spec again and it doesn't mention "references" anywhere in its section about the threadgroup address space. But it does in the sections about the device and constant address spaces. Not sure if this is an omission or if it's not officially supported.

image

My proposal for Naga is to do the same thing and move from a reference to a pointer.

Sounds good! A PR would be welcome if you have time!

@cwfitzgerald cwfitzgerald transferred this issue from gfx-rs/naga Oct 25, 2023
@cwfitzgerald cwfitzgerald added naga Shader Translator type: bug Something isn't working and removed kind: bug labels Oct 25, 2023
@teoxoy teoxoy added this to the WebGPU Specification V1 milestone Nov 3, 2023
@teoxoy teoxoy added the external: driver-bug A driver is causing the bug, though we may still want to work around it label Feb 1, 2024
@cshenton-work
Copy link

I am also running into issues with workgroupUniformLoad on Metal. Simply replacing usage of it with each thread accessing the relevant value individually (as one would on ever other gfx API, but prevented by the WebGPU spec), fixes a totally broken compute shader on Metal.

However, removing those invocations means that same shader will not compile in the browser. Since having to write platform specific shader code sort of defeats the purpose of using a cross-platform gfx API in the first place, I would appreciate a fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: naga back-end Outputs of naga shader conversion external: driver-bug A driver is causing the bug, though we may still want to work around it lang: Metal Metal Shading Language naga Shader Translator type: bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants