Skip to content

Commit

Permalink
switch mma instruction shape to 1684 from current 1688 for 3xTF32 L2/…
Browse files Browse the repository at this point in the history
…cosine kernel (#1057)

-- switch mma instruction shape to 1684 from current 1688 as it is always faster for all the inputs tried from DISTANCE_BENCH for L2 and cosine distances.
-- the speedup in best case is 1.37x, and at minimum it is 1.05x faster.

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1057
  • Loading branch information
mdoijade authored Dec 2, 2022
1 parent b77547c commit bee127a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion cpp/include/raft/distance/detail/pairwise_distance_gemm.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ struct PairwiseDistanceGemm {
/// Warp-level tile size (concept: GemmShape)
// This code section describes the size of MMA op
using InstructionShape =
cutlass::gemm::GemmShape<16, 8, 8>; // <- MMA Op tile M = 16, N = 8, K = 8
cutlass::gemm::GemmShape<16, 8, 4>; // <- MMA Op tile M = 16, N = 8, K = 4

/// Operation performed by GEMM
using Operator = cutlass::arch::OpMultiplyAddFastF32;
Expand Down

0 comments on commit bee127a

Please sign in to comment.