Improves 2d tiled matmulnbits by repeating A, loads N times for each B load #23071
Azure Pipelines / Windows CPU CI Pipeline (ort_training_apis_x64_release build_ort_training_apis_x64_release)
succeeded
Dec 11, 2024 in 1h 8m 20s
ort_training_apis_x64_release build_ort_training_apis_x64_release succeeded
Loading