-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aggregate_neighbors() is 100x slower than equivalent sparse matrix operation #124
Comments
Nice! The comparison is not totally fair because
I think we should introduce a fast path |
What would those look like and at what layer in the stack? Should we open up some issues in GPUArrays or CUDA.jl? Or something in the #arrayIR slack channel? |
the first step would be to integrate https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl in here for fast and parallelized sparse matrix operations. Then there is a lot of work to be done with sparse matrices support in CUDA.jl, there are a few issues open over there. Finally maybe develop sparse arrays in arbitrary dimensions and corresponding operations (see TACO) |
Solving this issue was actually quite easy: @btime aggregate_neighbors(g, +, g.edata.A)
# 3.773 ms (52848 allocations: 3.39 MiB) # on NNlib master
# 59.660 μs (3 allocations: 8.23 KiB) # with https://github.com/FluxML/NNlib.jl/pull/384 |
Closing this as the NNlib PR is being merged |
Although #106 has been solved by fusion #108, the slowness of the unfused implementation (
apply_edges
+aggregate_neighbors
) was not clearly understood. Realistic GNN models would contain mixed calls to message function, reduce function, and neural network layers, so they don't always exhibit a nice form for #108 to work.Profiling with ProfileSVG.jl shows that 60% of time was spent on
aggregate_neighbors
:Neighbor reduction with
+
is equivalent to either:e * A
wheree
is a unit vector, orEither way turns out to be more than 100x faster than
aggregate_neighbors
.Reproducible example
Here only uses a single edge feature. Multiple edge features would correspond to a 3D sparse tensor that is not supported by SparseArrays.jl -- TACO could be used then.
Package version
The text was updated successfully, but these errors were encountered: