Improves 2d tiled matmulnbits by repeating A, loads N times for each B load #23071
Azure Pipelines / Windows GPU CUDA CI Pipeline (cuda_dml build_x64_RelWithDebInfo)
succeeded
Dec 11, 2024 in 2h 11m 39s
cuda_dml build_x64_RelWithDebInfo succeeded
Loading