Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-head attention? #385

Closed
ToucheSir opened this issue Feb 8, 2022 · 2 comments · Fixed by #455
Closed

Multi-head attention? #385

ToucheSir opened this issue Feb 8, 2022 · 2 comments · Fixed by #455

Comments

@ToucheSir
Copy link
Member

Now that we're seeing models with attention layers in more than one downstream (e.g. FluxML/Metalhead.jl#105 and Transformers.jl), it may be time to consider pulling some building blocks into NNlib. CUDA.jl already wraps cuDNN's MHA too, see https://github.com/JuliaGPU/CUDA.jl/blob/27c87a6f261aa7964d797e8fe4bf33b46c1a185e/test/cudnn/multiheadattn.jl#L55.

@darsnack
Copy link
Member

darsnack commented Feb 9, 2022

Another case like recurrence where the cuDNN API doesn't match the cute semantics that we use to define the layer. One thing that the Metalhead PR made me consider was adding the ability to thread Parallel when the branches are expensive and equally computationally intensive.

@ToucheSir
Copy link
Member Author

That's an interesting idea. I think you could extend it to using separate CUDA streams for GPU-tasks too. Making AD co-operate would be its own challenge of course 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants