Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation issues with helper lanes in Wave Intrinsics #280

Open
Trark opened this issue Nov 3, 2021 · 0 comments
Open

Documentation issues with helper lanes in Wave Intrinsics #280

Trark opened this issue Nov 3, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@Trark
Copy link

Trark commented Nov 3, 2021

The documentation update in January clarified the language around wave intrinsics ( https://github.com/microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics/f9ac5be748c4bcd694c91a44a30ba5e810015d3d )
As far as I can tell the DirectX Specs and GDK documentation both link that page so I'm assuming it's the authority on their expected behaviour.
I have been verifying the behaviour we get with wave intrinsics across various platforms and have found a few issues with the documentation as written.

Driver support for excluding helper lanes from inputs

The Wave Vote intrinsics and Wave Reduction intrinsics are now marked as explicitly excluding helper lanes from the input.
The semantics described here do seem to apply in various AMD drivers I have tested locally. Shader model doesn't appear to make a difference. All consistent with new docs. Although I only had RDNA gpus available for PC testing, so I can't be sure for GCN or much older drivers.

These new rules don't seem to apply to many NVIDIA drivers. Drivers tested from ~January 2021 and ~November 2019 (before the doc update) do include helper lanes as inputs to these intrinsics. Up to date drivers do exclude helper lanes. I think drivers that support SM6.6 all do have the new behaviour.

If these are intended to be guaranteed behaviour after SM6.6 only, then the documentation shouldn't define the behaviour unconditionally.

WaveReadLaneFirst return value uniformity

Before the documentation update, WaveReadLaneFirst & WaveReadLaneAt were declared as "The following routines enable all active lanes in the current wave to receive the value from the specified lane, effectively broadcasting it.".
After the documentation update, WaveReadLaneFirst & WaveReadLaneAt are now declared as "The following routines enable all active non-helper lanes in the current wave to receive the value(s) from the specified lane(s)."
In both versions WaveReadLaneFirst also claims "The resulting value is thus uniform across the wave"

If helper lanes do not receive the output from WaveReadLaneFirst, then the output would not be uniform.
While inspecting codegen on AMD gpus I always get the result of this function ending up in a vector register, instead of a scalar value like I would natural expect. This is a little hard to be certain as the driver compiler is within its right to use vector instructions for uniform values if it can generate better code this way. The values do seem consistent between helper and non-helper lanes in simple tests, even if it is slower generated code than I would have liked.

So if I hopefully assume it is still guaranteed to be uniform, the wording shouldn’t exclude helper lanes from receiving the value (in the non-shuffle case).

Wave Reduction Intrinsics output uniformity

Before the documentation update, Wave Reduction Intrinsics were declared as "These intrinsics compute the specified operation across all active lanes in the wave and broadcast the final result to all active lanes.".
After the documentation update, Wave Reduction Intrinsics are now declared as "These intrinsics compute the specified operation across all active non-helper lanes in the wave and broadcast the final result to all active non-helper lanes.".
In both versions Wave Reduction Intrinsics also claims "Therefore, the final output is guaranteed uniform across the wave."

The old wording guarantees the output on helper lanes being consistent with non-helper lanes.
The new ones sort of does in the "Therefore, the final output is guaranteed uniform across the wave." line. But the change in wording in the first half implies it is not. All code gen I have seen so far does give scalar output.

Is this meant to say "These intrinsics compute the specified operation across all active non-helper lanes in the wave and broadcast the final result to all active lanes."?

@damyanp damyanp transferred this issue from microsoft/DirectXShaderCompiler Jul 23, 2024
@damyanp damyanp added the bug Something isn't working label Aug 8, 2024
@damyanp damyanp moved this to Triaged in HLSL Triage Aug 8, 2024
@damyanp damyanp added this to the Shader Model Backlog milestone Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Triaged
Development

No branches or pull requests

3 participants