Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
switch mma instruction shape to 1684 from current 1688 for 3xTF32 L2/…
…cosine kernel (#1057) -- switch mma instruction shape to 1684 from current 1688 as it is always faster for all the inputs tried from DISTANCE_BENCH for L2 and cosine distances. -- the speedup in best case is 1.37x, and at minimum it is 1.05x faster. Authors: - Mahesh Doijade (https://github.com/mdoijade) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1057
- Loading branch information