Skip to content

Commit

Permalink
Fused L2 1-NN based on cutlass 3xTF32 / DMMA (#1118)
Browse files Browse the repository at this point in the history
-- 3xTF32 & DMMA cutlass based persistent FusedL2NN kernel version loosely based on grouped gemm but customized for single problem size. 
-- as the value of `k` increases the performance benefit of this implementation gets better. 
for k==64 upto 1.3x, for k ==128 upto 1.53x, k == 256, up to 1.67x.
-- for all the sizes of `k`  this kernel out performs previous implementation.
-- attaching the results of FusedL2NN Benchmark of previous implementation with this cutlass version.

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1118
  • Loading branch information
mdoijade authored May 16, 2023
1 parent d891c00 commit a1d1fd6
Show file tree
Hide file tree
Showing 15 changed files with 3,369 additions and 44 deletions.
4 changes: 2 additions & 2 deletions cpp/cmake/thirdparty/get_cutlass.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,13 @@ function(find_and_configure_cutlass)
endfunction()

if(NOT RAFT_CUTLASS_GIT_TAG)
set(RAFT_CUTLASS_GIT_TAG v2.9.1)
set(RAFT_CUTLASS_GIT_TAG v2.10.0)
endif()

if(NOT RAFT_CUTLASS_GIT_REPOSITORY)
set(RAFT_CUTLASS_GIT_REPOSITORY https://github.com/NVIDIA/cutlass.git)
endif()

find_and_configure_cutlass(
VERSION 2.9.1 REPOSITORY ${RAFT_CUTLASS_GIT_REPOSITORY} PINNED_TAG ${RAFT_CUTLASS_GIT_TAG}
VERSION 2.10.0 REPOSITORY ${RAFT_CUTLASS_GIT_REPOSITORY} PINNED_TAG ${RAFT_CUTLASS_GIT_TAG}
)
Loading

0 comments on commit a1d1fd6

Please sign in to comment.