Replies: 3 comments 1 reply
-
My initial thought is to use a opaque pointer model (in Vulkan parlance, "logical" pointers), where pointer points to a value, which can be based on a memory location, a register value, scratchpad memory, or even sparse memory location. We should not allow mixed atomics and non-atomics modify operation to the same memory region within a parallel offload (where the size and attributes of a region is yet to be discussed, but I suggest calling a leaf SNode as a region). We should allow non-atomic read to be mixed with atomics but we do not grantee any kind of visibility of operations in the worse case, for potentially speeding up concurrent queue algorithms or fast software rasterizers. (This is perhaps in contrast to WebGPU where this is not allowed and viewed as unsafe) |
Beta Was this translation helpful? Give feedback.
-
Requesting for comments from the WebGPU / WGSL implementation's point of view @AmesingFlank |
Beta Was this translation helpful? Give feedback.
-
I feel like I'm missing some context here, so excuse me for throwing down some questions first. What is the main purpose of defining a memory model for Taichi? Is it purely for documentation/specification purposes and for guiding future implementations? Or does it involve actual engineering work that structurally improves our backends/codegen? More specifically, you mentioned in Slack that this could potentially allow us to build the runtime components in CHI IR, mays I ask how? |
Beta Was this translation helpful? Give feedback.
-
The porting effort of Taichi has reached a stage where we are targeting support for almost all compute capable graphics devices on earth. Ahead of v1.0 launch, I want to now start the work on a formal / semi-formal memory and execution model, so that we have something to reference and rely on when designing our back-ends, code-gen, and optimizations.
One valuable reference that we can build and modify upon is the Vulkan Memory Model: https://github.com/KhronosGroup/Vulkan-MemoryModel Which is being validated against on many devices and provide a common ground for consumer GPUs.
Another one is the C++ memory model: https://en.cppreference.com/w/cpp/language/memory_model . This has been the underpinning memory model that CUDA has been trying to fully support, and it is the basis of Apple's Metal memory model. By extension, WebGPU's memory model is also largely based on a combination of these two.
In this discussion, we wish to achieve some common ground on basic grantees and limits of such model, especially concerning memory semantics.
Beta Was this translation helpful? Give feedback.
All reactions