You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Another case like recurrence where the cuDNN API doesn't match the cute semantics that we use to define the layer. One thing that the Metalhead PR made me consider was adding the ability to thread Parallel when the branches are expensive and equally computationally intensive.
That's an interesting idea. I think you could extend it to using separate CUDA streams for GPU-tasks too. Making AD co-operate would be its own challenge of course 😅
Now that we're seeing models with attention layers in more than one downstream (e.g. FluxML/Metalhead.jl#105 and Transformers.jl), it may be time to consider pulling some building blocks into NNlib. CUDA.jl already wraps cuDNN's MHA too, see https://github.com/JuliaGPU/CUDA.jl/blob/27c87a6f261aa7964d797e8fe4bf33b46c1a185e/test/cudnn/multiheadattn.jl#L55.
The text was updated successfully, but these errors were encountered: