Fused L2 1-NN based on cutlass 3xTF32 / DMMA (#1118)

-- 3xTF32 & DMMA cutlass based persistent FusedL2NN kernel version loosely based on grouped gemm but customized for single problem size. -- as the value of `k` increases the performance benefit of this implementation gets better. for k==64 upto 1.3x, for k ==128 upto 1.53x, k == 256, up to 1.67x. -- for all the sizes of `k` this kernel out performs previous implementation. -- attaching the results of FusedL2NN Benchmark of previous implementation with this cutlass version. Authors: - Mahesh Doijade (https://github.com/mdoijade) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) URL: #1118
rapidsai · May 16, 2023 · a1d1fd6 · a1d1fd6
1 parent d891c00
commit a1d1fd6
Show file tree

Hide file tree

Showing 15 changed files with 3,369 additions and 44 deletions.
diff --git a/cpp/cmake/thirdparty/get_cutlass.cmake b/cpp/cmake/thirdparty/get_cutlass.cmake
@@ -78,13 +78,13 @@ function(find_and_configure_cutlass)
 endfunction()
 
 if(NOT RAFT_CUTLASS_GIT_TAG)
-  set(RAFT_CUTLASS_GIT_TAG v2.9.1)
+  set(RAFT_CUTLASS_GIT_TAG v2.10.0)
 endif()
 
 if(NOT RAFT_CUTLASS_GIT_REPOSITORY)
   set(RAFT_CUTLASS_GIT_REPOSITORY https://github.com/NVIDIA/cutlass.git)
 endif()
 
 find_and_configure_cutlass(
-  VERSION 2.9.1 REPOSITORY ${RAFT_CUTLASS_GIT_REPOSITORY} PINNED_TAG ${RAFT_CUTLASS_GIT_TAG}
+  VERSION 2.10.0 REPOSITORY ${RAFT_CUTLASS_GIT_REPOSITORY} PINNED_TAG ${RAFT_CUTLASS_GIT_TAG}
 )