[AMDGPU] Add dispatch path for FP16 batched mul #488

pxl-th · 2023-04-13T22:15:01Z

rocBLAS supports FP16 batched gemm, but NNlib was dispatching to generic implementation.

PR Checklist

Tests are added
Documentation, if applicable

ToucheSir · 2023-04-17T20:22:36Z

ext/NNlibAMDGPUExt/batched_mul.jl

+    C
+end
+
+function NNlib._batched_gemm!(


Now that we have two GPU backends and both of them override this function, I wonder if it should become a more formal part of the API. Something to think about at a later date though.

Add dispatch path for FP16 batched mul

8eebdcb

pxl-th changed the title ~~Add dispatch path for FP16 batched mul~~ [AMDGPU] Add dispatch path for FP16 batched mul Apr 13, 2023

pxl-th mentioned this pull request Apr 17, 2023

Implement StableDiffusion JuliaNeuralGraphics/Diffusers.jl#22

Merged

10 tasks

ToucheSir approved these changes Apr 17, 2023

View reviewed changes

ToucheSir merged commit adbf444 into FluxML:master Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Add dispatch path for FP16 batched mul #488

[AMDGPU] Add dispatch path for FP16 batched mul #488

pxl-th commented Apr 13, 2023

ToucheSir Apr 17, 2023

[AMDGPU] Add dispatch path for FP16 batched mul #488

[AMDGPU] Add dispatch path for FP16 batched mul #488

Conversation

pxl-th commented Apr 13, 2023

PR Checklist

ToucheSir Apr 17, 2023

Choose a reason for hiding this comment