[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

armansito · 2023-09-15T22:23:45Z

I believe I have identified a bug in the Metal compiler that violates the expected memory access and barrier ordering when a compute kernel uses dynamic threadgroup memory (i.e. when threadgroup shared memory is declared as an entry-point parameter rather than a fixed-size function-local variable). I have a very reliable reproduction (in Metal) of this bug implemented and detailed at https://github.com/armansito/metal-workgroup-memory-bug. A reproduction of the bug
using wgpu can be found here: https://github.com/armansito/naga-workgroup-memory-bug.

I ran into this while debugging a system-wide freeze due to one of the shaders in linebender/vello that uses the result of workgroupUniformLoad from shared memory to exit a compute stage. The miscompilation causes the load to be non-uniform which violates an important invariant. The bug is reproducible using wgpu but not when the same shaders are executed by Dawn. I believe this is because Naga implements workgroup buffers as dynamic (i.e. an entry-point parameter) while Tint does not. I think this is also possibly related to #3181.

A WGSL version of the Metal program from my test case looks like this:

@group(0) @binding(0) var<uniform> flag: u32;
@group(0) @binding(1) var<storage, read_write> output: atomic<u32>;

var<workgroup> shared_flag: u32;

@compute @workgroup_size(64)
fn main(@builtin(local_invocation_id) local_id: vec3<u32>) {
    if local_id.x == 0u {
        shared_flag = 0xffffffffu;
    }
    workgroupBarrier();

    if local_id.x == 0u {
        shared_flag = flag;
    }

    let abort = workgroupUniformLoad(&shared_flag);
    if abort != 0u {
        return;
    }

    atomicAdd(&output, 1u);
}

The test case sets up flag to always have a value of 0. The expected behavior is for all threads in the workgroup to add 1 to output (which is what I see with Dawn/Tint). The observed behavior is that seemingly only one SIMD group increments the atomic in all workgroups.

This is obviously not a bug in Naga but in the Metal driver but since it impacts correctness I propose that Naga take the same approach as Tint and convert workgroup variables to a function-local threadgroup variable instead of an entry-point parameter.

The text was updated successfully, but these errors were encountered:

Zero-initializing workgroup memory works around a Metal driver bug that can cause the tile_alloc and coarse stages to freeze up the system when running natively on Metal due to their use of `workgroupUniformLoad`. See gfx-rs/naga#2482 for more details about this bug. Zero-initializing workgroup memory also makes the native Metal shaders match the behavior of the current wgpu runner and the invariants of the original WGSL that the shaders were authored in.

armansito · 2023-10-16T15:50:28Z

Hi there! I promised an update a while ago but I didn't get a chance to update the issue. I narrowed down the miscompilation to the dynamic workgroup memory declaration itself: the bug in wg-mem-as-arg.metal goes away if I change the argument declaration from a reference (i.e. threadgroup uint& shared_flag [[threadgroup(0)]]) to a pointer (i.e. threadgroup uint* shared_flag [[threadgroup(0)]]).

This seems to reliably fix the bug. This also happens to be what Tint does in certain cases where it generates a workgroup variable as an entry-point parameter to work around a different compiler issue (see crbug.com/tint/938). My proposal for Naga is to do the same thing and move from a reference to a pointer.

teoxoy · 2023-10-16T17:00:32Z

Thanks for further looking into this!

I checked the metal spec again and it doesn't mention "references" anywhere in its section about the threadgroup address space. But it does in the sections about the device and constant address spaces. Not sure if this is an omission or if it's not officially supported.

My proposal for Naga is to do the same thing and move from a reference to a pointer.

Sounds good! A PR would be welcome if you have time!

cshenton-work · 2024-03-12T21:06:15Z

I am also running into issues with workgroupUniformLoad on Metal. Simply replacing usage of it with each thread accessing the relevant value individually (as one would on ever other gfx API, but prevented by the WebGPU spec), fixes a totally broken compute shader on Metal.

However, removing those invocations means that same shader will not compile in the browser. Since having to write platform specific shader code sort of defeats the purpose of using a cross-platform gfx API in the first place, I would appreciate a fix!

armansito mentioned this issue Sep 19, 2023

[shaders][msl] Enable zero_initialize_workgroup_memory setting linebender/vello#363

Merged

teoxoy added kind: bug lang: Metal Metal Shading Language area: naga back-end Outputs of naga shader conversion labels Sep 21, 2023

cwfitzgerald transferred this issue from gfx-rs/naga Oct 25, 2023

cwfitzgerald added naga Shader Translator type: bug Something isn't working and removed kind: bug labels Oct 25, 2023

teoxoy added this to the WebGPU Specification V1 milestone Nov 3, 2023

teoxoy added this to WebGPU for Firefox Jan 18, 2024

teoxoy added the external: driver-bug A driver is causing the bug, though we may still want to work around it label Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

armansito commented Sep 15, 2023 •

edited

Loading

armansito commented Oct 16, 2023 •

edited

Loading

teoxoy commented Oct 16, 2023

cshenton-work commented Mar 12, 2024

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal #4500

Comments

armansito commented Sep 15, 2023 • edited Loading

armansito commented Oct 16, 2023 • edited Loading

teoxoy commented Oct 16, 2023

cshenton-work commented Mar 12, 2024

armansito commented Sep 15, 2023 •

edited

Loading

armansito commented Oct 16, 2023 •

edited

Loading