You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I refactored the code base to use loops on CPU instead of broadcasting. This makes the code quite a bit faster but more importantly allows us to easily swap in LoopVectorization
I don't think NNlib will accept LoopVectorization as a dependency. So we implement them here itself
batched_mul --> A batched_matmul that checks if the Array is on CPU and if it can be loop vectorized then we loop vectorize else we forward to NNlib.batched_mul.
conv --> Bypass the CPU conv routines in fused_conv with ones written using Loop Vectoization.
We have the infrastructure setup in impl/fast_ops.jl. For LoopedArrayOp we need to simply use VectorizedStatistics.jl (and maybe VectorizedReductions.jl).
Automatic Differentiation
We don't need to worry about ChainRules. It anyways has rrule defined as of now. But Enzyme really not happy with Loop Vectorization. Use custom rules for the following:
Recently I refactored the code base to use loops on CPU instead of broadcasting. This makes the code quite a bit faster but more importantly allows us to easily swap in LoopVectorization
See commit history of LuxDL/LuxLib.jl#97 for more details.
Improvements over NNlib Functions
I don't think NNlib will accept LoopVectorization as a dependency. So we implement them here itself
batched_mul
--> Abatched_matmul
that checks if the Array is on CPU and if it can be loop vectorized then we loop vectorize else we forward toNNlib.batched_mul
.conv
--> Bypass the CPU conv routines infused_conv
with ones written using Loop Vectoization.pool
ing operationsImplementations where LoopVectorization will help
Reductions
impl/fast_ops.jl
. ForLoopedArrayOp
we need to simply useVectorizedStatistics.jl
(and maybeVectorizedReductions.jl
).Automatic Differentiation
We don't need to worry about ChainRules. It anyways has rrule defined as of now. But Enzyme really not happy with Loop Vectorization. Use custom rules for the following:
The text was updated successfully, but these errors were encountered: