-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal GroupNorm Implementation on GPUs #10
Comments
cuDNN should support this, so it's mostly a matter of hooking up NNlib + NNlibCUDA (unless you're fine with directly calling the CUDA.jl routines here) |
Actually I dont think CUDNN supports this (at least could figure it out from its documentation). Pytorch uses its own kernel https://github.com/pytorch/pytorch/blob/35d4a805ebc3b6eca1bafb2d332dffa8d0c1fc54/aten/src/ATen/native/cuda/group_norm_kernel.cu |
I must've hallucinated a mention of groups in the docs for |
…ns/create-pull-request-5 Bump peter-evans/create-pull-request from 4 to 5
Generalize the generators to complex numbers
As observed in SciML/DeepEquilibriumNetworks.jl#45 (comment) we get a 2x speedup by moving from GroupNorm to BatchNorm which uses CUDNN kernels.
The text was updated successfully, but these errors were encountered: