Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COO FeaturedGraph #204

Closed
wants to merge 11 commits into from
Closed

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Jul 20, 2021

This is just an investigation of what would entail having a COO implementation for the FeaturedGraph.

I reimplemented FeaturedGraph inside this repo, as a subtype of LightGraphs.AbstractGraphs. Tests aren't passing yet.

For the time being, this PR drops GraphSignals.jl as a dependence. If the investigation turns out to be successful I will move the code to GraphSignals.

UPDATE:

I have done a large redesign of the library, the code is much simpler, and overall performance should be much better (especially on gpu).

  • The source, target (COO) edges' representation is the natural fit for message passing. Everything is handled in a very concise way by NNlib.gather and NNlib.scatter
  • GPU support is back for all layers except for ChebConv!
  • using the COO format, adjacency_matrix and normalized_laplacian can be expressed in a gpu friendly way
  • GCNConv now has two implementations, both gpu friendly, one based on message passing one on multiplication by normalized laplacian. The second one is commented out, waiting for adjacency matrix storage support in FeaturedGraph
    (which is the case where it would make sense to use the laplacian algebra instead of message passing).
  • FeaturedGraph now admits has the only underlying graph representation the COO format. With COO
    we can guarantee efficient, gpu friendly, flexible, and consistent implementations for every layer.
    • Supporting also LightGraphs types or adjacency lists would lead to a ton of inefficiencies and its not worth doing.
    • We can gradually add back (possibly sparse) adjacency matrix support, but only for those layers for which we can provide fast implementation. We don't want user to fall in performance traps.
  • I merged GraphNet and MessagePassing types into MessagePassing. There seems to be no need (efficiency/flexibility/convenience) to have both.
  • The message and update functions now deal with batched node/edge features coming from gather and scatter. Perfomance-wise, this is much better than the previous implementation relying on mapreduce.

Fix #185 fix #194 fix #195 fix #197 fix #200 fix #209

TODO list (this PR or future ones)

  • revisit documentation
  • handle self loops in GCNConv when using message passing
  • make sure gather/scatter handle nicely the case of isolated nodes
  • ChebConv with message passing
  • gpu friendly scaled_laplacian
  • add FeaturedGraph support for (sparse) adjacency matrix underlying representation
  • move the new FeaturedGraph implementation back to GraphSignals
  • handle deprecations as nicely as possible.
  • Help downstream packages to move to the new API (SeaPerl.jl)
  • tag a new breacking release

@CarloLucibello CarloLucibello changed the title attempt at COO FeaturedGraph COO FeaturedGraph Jul 29, 2021
@CarloLucibello CarloLucibello marked this pull request as ready for review July 29, 2021 10:49
@yuehhua
Copy link
Member

yuehhua commented Jul 29, 2021

About graph network, I suggest here is the paper to read. Especially, chapter 3 Graph networks.

@CarloLucibello
Copy link
Member Author

About graph network, I suggest here is the paper to read. Especially, chapter 3 Graph networks.

yes I know that paper, which is what the GraphNet paper implemented. The implementation in this PR is slightly more general. There are 2 differences:

  • the edge message M is decoupled from the edge feature E. This is because while all layers produce M in the message passing, this doesn't have to be identified with E, which most layers don't need to produce and store. GraphNet behavior can be simply reproduced with
update_edge(l::Layer, M, E) = M
  • the global_update update function now takes the new E and X instead of the corresponding aggregated quantities. This way we don't have to pass around two more aggregating operators and keep a few odd aggregate(::typeof(min), ...) definitions, users can well define their own aggregations.

I think these are 2 positive changes, but they are not particularly relevant, I can revert them if you want.

edge_index
num_nodes::Int
num_edges::Int
# ndata::Dict{String, Any} # https://github.com/FluxML/Zygote.jl/issues/717
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include various kind of features in a FeaturedGraph? I had think of this before, but I give up and turn to MetaGraph. MetaGraph has implemented similar structure as this. And another problem will come to we should provide a way for user to specify what node/edge/global feature them want to train on.

Is it better to use Symbol as key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires some extra thought, I don't plan to implement it in this PR, just wanted to write down a stub.

Both PytorchGeometric and DGL support adding an arbitrary number of edge and node features, we should do the same at some point since it is something very convenient to have (e.g. you want to store node labels, training masks etc...)

MetaGraph.jl is not a good fit. Besides being scarcely maintained, it provides maps with single nodes/edges as keys which is not good for vectorized operations, we want to store arrays.

src/featured_graph.jl Outdated Show resolved Hide resolved
# l.σ.(l.weight * x * L̃ .+ l.bias)
# end

message(l::GCNConv, xi, xj) = xj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite confused with GCNConv implemented with message-passing neural network. I doesn't leverage anything from the MPNN. As I said, all it need are algebraic operations, e.g. l.weight * x .+ l.bias, instead of indexing by xi or xj. So, I don't think GCNConv <: MessagePassing is good. Furthermore, people wouldn't classify GCNConv as a spatial-based graph convolution layer. It is a spectral-based graph convolution layer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite confused with GCNConv implemented with message-passing neural network. I doesn't leverage anything from the MPNN.

It is leveraging gather and scatter(+,...), which basically implement sparse matrix multiplication

As I said, all it need are algebraic operations, e.g. l.weight * x .+ l.bias, instead of indexing by xi or xj. So, I don't think GCNConv <: MessagePassing is good. Furthermore, people wouldn't classify GCNConv as a spatial-based graph convolution layer. It is a spectral-based graph convolution layer.

Although it was introduced as a spectral-based operator, I think most people these days think about it in the general message passing scheme. In any case, both interpretations are valid. The important thing is to do something which is well adapted to the underlying graph implementation. Calling adjacency_matrix(fg) is a huge performance hit when the underlying is COO or adjacency list, we have to prohibit these perfomance traps.
I plan to expand FeaturedGraph support for sparse adjacency matrix representations, so that we can have back the algebraic form of GCNConv. I would do that in a separate PR though if you don't mind, this is already quite huge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most people these days think about it in the general message passing scheme.

Sorry, I couldn't accept this statement. It is definition. Please don't break definitions. If most people do, they are wrong in the beginning.

The important thing is to do something which is well adapted to the underlying graph implementation.

I agree with this statement.

Calling adjacency_matrix(fg) is a huge performance hit when the underlying is COO or adjacency list, we have to prohibit these perfomance traps.

Under definition, it is no way to take the performance issue as excuse. If the huge performance is such, then just find a way to resolve the conversion penalty issues, rather than breaking definitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to expand FeaturedGraph support for sparse adjacency matrix representations

I agree with you on this point and this is the way I planned long time ago.

I would do that in a separate PR though if you don't mind, this is already quite huge

Yeah, I have the same feeling. It's huge. Let's break it down.

@CarloLucibello
Copy link
Member Author

I think this is good to go, I don't want to overcharge a single PR which is quite big already.
We could adopt the checklist in the first post as a TODO list for the next release

Comment on lines +3 to +28
"""
MessagePassing

The abstract type from which all message passing layers are derived.

Related methods are [`propagate`](@ref), [`message`](@ref),
[`update`](@ref), [`update_edge`](@ref), and [`update_global`](@ref).
"""
abstract type MessagePassing end

"""
propagate(mp::MessagePassing, fg::FeaturedGraph, aggr)
propagate(mp::MessagePassing, fg::FeaturedGraph, E, X, u, aggr)

Perform the sequence of operation implementing the message-passing scheme
and updating node, edge, and global features `X`, `E`, and `u` respectively.

The computation involved is the following:

```julia
M = compute_batch_message(mp, fg, E, X, u)
E = update_edge(mp, M, E, u)
M̄ = aggregate_neighbors(mp, aggr, fg, M)
X = update(mp, M̄, X, u)
u = update_global(mp, E, X, u)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really annoying. It confuses the definition of message-passing network and graph network. We shouldn't break the definitions. By definition from paper, there are two functions to be defined:

  • message: it process information from node feature of itself and it's neighbors, as well as edge feature, and then gives a message as output.
  • update: Message after aggregation is passed to update function. It takes message and node feature itself as input and output a new node feature for next layer.

Between these, an aggregation function is needed to apply to aggregate messages.

But for graph network is not the case. Graph network is a more general scheme that include message-passing network. Paper defines a series of APIs and the algorithm for graph network. It contains:

  • Update function:
    • For node
    • For edge
    • For global
  • Aggregate function:
    • For aggregating edge feature into node feature
    • For aggregating edge feature into global feature
    • For aggregating node feature into global feature

This architecture is flexible because the functions above should be optional. They can be "turn on/off". If the definition is given by user, thus, it is "turned on". Thus, user can combine their own neural network by giving definitions or not.

Thus, message-passing network is one of the specialization of graph network. It is specialize by following:

  • message is the update function of edge in graph network.
  • aggregation in message-passing is the aggregation from edge feature into node feature in graph network.
  • update is the update function of node in graph network.

I think engineering must be done by satisfying the definitions above. If the behavior is different, we shouldn't call it the same name. MessagePassing is for message-passing network and GraphNet is for graph network.

@CarloLucibello
Copy link
Member Author

I'm breaking this in 2 or 3 PRs. First on is #215

@yuehhua
Copy link
Member

yuehhua commented Aug 24, 2021

Close this due to merge of yuehhua/GraphSignals.jl#54.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants