Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal GroupNorm Implementation on GPUs #10

Closed
avik-pal opened this issue Apr 24, 2022 · 3 comments · Fixed by #156
Closed

Suboptimal GroupNorm Implementation on GPUs #10

avik-pal opened this issue Apr 24, 2022 · 3 comments · Fixed by #156

Comments

@avik-pal
Copy link
Member

As observed in SciML/DeepEquilibriumNetworks.jl#45 (comment) we get a 2x speedup by moving from GroupNorm to BatchNorm which uses CUDNN kernels.

@ToucheSir
Copy link
Contributor

cuDNN should support this, so it's mostly a matter of hooking up NNlib + NNlibCUDA (unless you're fine with directly calling the CUDA.jl routines here)

@avik-pal
Copy link
Member Author

Actually I dont think CUDNN supports this (at least could figure it out from its documentation). Pytorch uses its own kernel https://github.com/pytorch/pytorch/blob/35d4a805ebc3b6eca1bafb2d332dffa8d0c1fc54/aten/src/ATen/native/cuda/group_norm_kernel.cu

@ToucheSir
Copy link
Contributor

I must've hallucinated a mention of groups in the docs for cudnnNormalizationForward* then. The PyTorch kernel is quite a beast, so unless someone's up to the task of translating it I think we're stuck with the slower vectorized variant for now. Ideally we would figure out why https://triton-lang.org/master/getting-started/tutorials/05-layer-norm.html is so fast, tweak it to run groupnorm instead and port it to KernelAbstractions or the like. @vchuravy is KA sufficiently high-level to handle such a translation?

avik-pal added a commit that referenced this issue Nov 3, 2024
…ns/create-pull-request-5

Bump peter-evans/create-pull-request from 4 to 5
avik-pal added a commit that referenced this issue Nov 3, 2024
avik-pal added a commit that referenced this issue Nov 3, 2024
Generalize the generators to complex numbers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants