Improves 2d tiled matmulnbits by repeating A, loads N times for each B load #12217
sca.yml
on: pull_request
Onnxruntime-SCA-training-CUDA
1h 17m
Onnxruntime-SCA-win32-WebGPU-x64
56m 50s
Onnxruntime-SCA-win32-WINML-x64
58m 32s
Onnxruntime-SCA-win32-WINML-x86
59m 37s