You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The legacy CUDA warp-level functions such as __shfl() [can be called from divergent regions (if the programmer is careful to avoid undefined values in the non-active lanes). See Listing 13 here.
This means that above codes might end up in a barrier deadlock. We should probably add a compiler pass to detect these cases. Implementing them (like the new masked warp-level functions) is not trivial with subgroups/OpenCL and needs a further extension.
The text was updated successfully, but these errors were encountered:
This is related to #381. Most likely also other warp/WG-level functions of CUDA assume the "or exited" semantics whereas OpenCL doesn't recognize that.
The legacy CUDA warp-level functions such as __shfl() [can be called from divergent regions (if the programmer is careful to avoid undefined values in the non-active lanes). See Listing 13 here.
However, OpenCL specifies subgroup functions to be like subgroup barrier calls related to covergence, they must be reached by all WIs inside the subgroup.
This means that above codes might end up in a barrier deadlock. We should probably add a compiler pass to detect these cases. Implementing them (like the new masked warp-level functions) is not trivial with subgroups/OpenCL and needs a further extension.
The text was updated successfully, but these errors were encountered: