Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Minor Additions to Enable Tiling and Explicit Memory Movement Transfo…
…rmations (#1636) I made some minor additions to make implementing some transformations easier for me. I will explain all three changes and why I needed them. 1. Add gpu_force_syncthreads to force a call to __syncthreads in a map in dace/codegen/targets/cuda.py and dace/sdfg/nodes.py. - I preferred to tile work maps (e.g., K reduction for sum-of-inner-products matrix multiplication) of kernels in such a way that all new tiled maps are in the scope of the thread block map, yet when it is combined with shared memory, a `__syncthreads` call is necessary within the thread block map which is not performed for sequential maps inside a thread block scheduled map, I would like to be able to force this behavior 2. Adding the skew option to the map tiling transformation. - Having every map start from 0 makes writing my transformations simpler. Therefore, I wanted the map tiling transformation to start the inner map at 0; I could only achieve this behavior by copying over the skew parameter from the strip mine transformation. I would still prefer to use the map tiling transformation instead of strip mine while having the skew parameter.
- Loading branch information