Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add benchmark script #214

Merged
merged 4 commits into from
Aug 6, 2021
Merged

add benchmark script #214

merged 4 commits into from
Aug 6, 2021

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Aug 2, 2021

Adding a script doing some basic benchmarking, in order to track performance changes as the library evolves.

I selected for testing the GCNLayer and the GraphConv layers as representatives of the algebraic and the message passing approaches respectively. Edit: Also added GATConv.
The benchmarks are on random graphs of small and up to moderate size (10 to 10k nodes).
Different graph storage types are tested (adjacency list, dense adjacency matrix, sparse adjacency matrix, LightGraphs.SimpleGraph).
These are the results on my laptop a server, which I'm also committing as a reference

julia> df = run_benchmarks();

julia> for g in groupby(df, :layer); println(g,"\n"); end
16×8 SubDataFrame
 Row │ N       c         layer    gtype       time_fg                    time_cpu                   time_gpu  gpu_to_cpu 
     │ Int64?  Float64?  String?  Symbol?     Any                        Any                        Any       Float64?   
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   110       6.0  GATConv  adjlist     TrialEstimate(18.473 μs)   TrialEstimate(852.293 μs)  missing          Inf
   210       6.0  GATConv  dense       TrialEstimate(3.054 μs)    TrialEstimate(831.063 μs)  missing          Inf
   310       6.0  GATConv  lightgraph  TrialEstimate(56.061 ns)   TrialEstimate(871.419 μs)  missing          Inf
   410       6.0  GATConv  sparse      TrialEstimate(4.438 μs)    TrialEstimate(1.128 ms)    missing          Inf
   5100       6.0  GATConv  adjlist     TrialEstimate(161.550 μs)  TrialEstimate(11.370 ms)   missing          Inf
   6100       6.0  GATConv  dense       TrialEstimate(76.516 μs)   TrialEstimate(11.984 ms)   missing          Inf
   7100       6.0  GATConv  lightgraph  TrialEstimate(55.989 ns)   TrialEstimate(11.083 ms)   missing          Inf
   8100       6.0  GATConv  sparse      TrialEstimate(70.337 μs)   TrialEstimate(14.045 ms)   missing          Inf
   91000       6.0  GATConv  adjlist     TrialEstimate(1.864 ms)    TrialEstimate(447.102 ms)  missing          Inf
  101000       6.0  GATConv  dense       TrialEstimate(9.331 ms)    TrialEstimate(465.438 ms)  missing          Inf
  111000       6.0  GATConv  lightgraph  TrialEstimate(56.867 ns)   TrialEstimate(447.711 ms)  missing          Inf
  121000       6.0  GATConv  sparse      TrialEstimate(6.740 ms)    TrialEstimate(471.377 ms)  missing          Inf
  1310000       6.0  GATConv  adjlist     TrialEstimate(22.646 ms)   TrialEstimate(3.384 s)     missing          Inf
  1410000       6.0  GATConv  dense       TrialEstimate(1.435 s)     TrialEstimate(6.987 s)     missing          Inf
  1510000       6.0  GATConv  lightgraph  TrialEstimate(57.248 ns)   TrialEstimate(3.227 s)     missing          Inf
  1610000       6.0  GATConv  sparse      TrialEstimate(1.002 s)     TrialEstimate(6.090 s)     missing          Inf

16×8 SubDataFrame
 Row │ N       c         layer    gtype       time_fg                    time_cpu                   time_gpu  gpu_to_cpu 
     │ Int64?  Float64?  String?  Symbol?     Any                        Any                        Any       Float64?   
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   110       6.0  GCNConv  adjlist     TrialEstimate(18.570 μs)   missing                    missing      missing 
   210       6.0  GCNConv  dense       TrialEstimate(3.109 μs)    TrialEstimate(21.885 μs)   missing          Inf
   310       6.0  GCNConv  lightgraph  TrialEstimate(47.979 ns)   TrialEstimate(21.825 μs)   missing          Inf
   410       6.0  GCNConv  sparse      TrialEstimate(4.423 μs)    TrialEstimate(28.175 μs)   missing          Inf
   5100       6.0  GCNConv  adjlist     TrialEstimate(160.488 μs)  missing                    missing      missing 
   6100       6.0  GCNConv  dense       TrialEstimate(77.154 μs)   TrialEstimate(250.699 μs)  missing          Inf
   7100       6.0  GCNConv  lightgraph  TrialEstimate(55.957 ns)   TrialEstimate(121.048 μs)  missing          Inf
   8100       6.0  GCNConv  sparse      TrialEstimate(71.684 μs)   TrialEstimate(195.570 μs)  missing          Inf
   91000       6.0  GCNConv  adjlist     TrialEstimate(1.841 ms)    missing                    missing      missing 
  101000       6.0  GCNConv  dense       TrialEstimate(8.867 ms)    TrialEstimate(22.922 ms)   missing          Inf
  111000       6.0  GCNConv  lightgraph  TrialEstimate(56.552 ns)   TrialEstimate(1.026 ms)    missing          Inf
  121000       6.0  GCNConv  sparse      TrialEstimate(6.447 ms)    TrialEstimate(7.680 ms)    missing          Inf
  1310000       6.0  GCNConv  adjlist     TrialEstimate(22.145 ms)   missing                    missing      missing 
  1410000       6.0  GCNConv  dense       TrialEstimate(1.450 s)     TrialEstimate(3.092 s)     missing          Inf
  1510000       6.0  GCNConv  lightgraph  TrialEstimate(57.708 ns)   TrialEstimate(9.887 ms)    missing          Inf
  1610000       6.0  GCNConv  sparse      TrialEstimate(996.760 ms)  TrialEstimate(1.012 s)     missing          Inf

16×8 SubDataFrame
 Row │ N       c         layer      gtype       time_fg                    time_cpu                   time_gpu  gpu_to_cpu 
     │ Int64?  Float64?  String?    Symbol?     Any                        Any                        Any       Float64?   
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   110       6.0  GraphConv  adjlist     TrialEstimate(18.532 μs)   TrialEstimate(429.637 μs)  missing          Inf
   210       6.0  GraphConv  dense       TrialEstimate(3.113 μs)    TrialEstimate(425.891 μs)  missing          Inf
   310       6.0  GraphConv  lightgraph  TrialEstimate(46.217 ns)   TrialEstimate(395.829 μs)  missing          Inf
   410       6.0  GraphConv  sparse      TrialEstimate(4.404 μs)    TrialEstimate(457.810 μs)  missing          Inf
   5100       6.0  GraphConv  adjlist     TrialEstimate(161.810 μs)  TrialEstimate(7.470 ms)    missing          Inf
   6100       6.0  GraphConv  dense       TrialEstimate(76.830 μs)   TrialEstimate(7.544 ms)    missing          Inf
   7100       6.0  GraphConv  lightgraph  TrialEstimate(55.750 ns)   TrialEstimate(7.296 ms)    missing          Inf
   8100       6.0  GraphConv  sparse      TrialEstimate(76.117 μs)   TrialEstimate(7.797 ms)    missing          Inf
   91000       6.0  GraphConv  adjlist     TrialEstimate(1.849 ms)    missing                    missing      missing 
  101000       6.0  GraphConv  dense       TrialEstimate(9.295 ms)    missing                    missing      missing 
  111000       6.0  GraphConv  lightgraph  TrialEstimate(56.272 ns)   missing                    missing      missing 
  121000       6.0  GraphConv  sparse      TrialEstimate(6.479 ms)    missing                    missing      missing 
  1310000       6.0  GraphConv  adjlist     TrialEstimate(23.406 ms)   missing                    missing      missing 
  1410000       6.0  GraphConv  dense       TrialEstimate(1.438 s)     missing                    missing      missing 
  1510000       6.0  GraphConv  lightgraph  TrialEstimate(57.641 ns)   missing                    missing      missing 
  1610000       6.0  GraphConv  sparse      TrialEstimate(1.004 s)     missing                    missing      missing 

A few notes:

  • cuda support is broken for all graph types and all layers
  • cpu support is broken for GCNConv with adjacency list graph type
  • GraphConv is more than 10x slower than GCNConv, at least at smaller sizes. Ideally, performance should be comparable

Quite surprisingly, the most performant graph type seems to be the LightGraphs's one, with several orders of magnitude better performances with GCNConv. Since GCNConv calls adjacency_matrix, that's a problem (for the other graph types) that should be easily fixed by providing better specialized methods.
This is the comparison between lightgraph and other storage types

julia> dfcomp = compare(dfmaster, dfmaster);

julia> filter(r -> r.layer == "GCNConv" && r.in_type_master==:lightgraph, dfcomp)
16×6 DataFrame
 Row │ n       in_type_pr  in_type_master  layer    pr_to_master_cpu       pr_to_master_gpu 
     │ Int64?  Symbol?     Symbol?         String?  TrialRatio?           Missing          
─────┼──────────────────────────────────────────────────────────────────────────────────────
   110  adjlist     lightgraph      GCNConv  missing                         missing 
   210  dense       lightgraph      GCNConv  TrialRatio(99.99%)              missing 
   310  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
   410  sparse      lightgraph      GCNConv  TrialRatio(119.10%)             missing 
   5100  adjlist     lightgraph      GCNConv  missing                         missing 
   6100  dense       lightgraph      GCNConv  TrialRatio(220.54%)             missing 
   7100  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
   8100  sparse      lightgraph      GCNConv  TrialRatio(116.22%)             missing 
   91000  adjlist     lightgraph      GCNConv  missing                         missing 
  101000  dense       lightgraph      GCNConv  TrialRatio(1774.30%)            missing 
  111000  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
  121000  sparse      lightgraph      GCNConv  TrialRatio(493.16%)             missing 
  1310000  adjlist     lightgraph      GCNConv  missing                         missing 
  1410000  dense       lightgraph      GCNConv  TrialRatio(20334.98%)           missing 
  1510000  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
  1610000  sparse      lightgraph      GCNConv  TrialRatio(6174.97%)            missing 

julia> filter(r -> r.layer == "GraphConv" && r.in_type_master==:lightgraph, dfcomp)
16×6 DataFrame
 Row │ n       in_type_pr  in_type_master  layer      pr_to_master_cpu     pr_to_master_gpu 
     │ Int64?  Symbol?     Symbol?         String?    TrialRatio?         Missing          
─────┼──────────────────────────────────────────────────────────────────────────────────────
   110  adjlist     lightgraph      GraphConv  TrialRatio(107.25%)           missing 
   210  dense       lightgraph      GraphConv  TrialRatio(102.36%)           missing 
   310  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
   410  sparse      lightgraph      GraphConv  TrialRatio(90.26%)            missing 
   5100  adjlist     lightgraph      GraphConv  TrialRatio(106.08%)           missing 
   6100  dense       lightgraph      GraphConv  TrialRatio(105.67%)           missing 
   7100  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
   8100  sparse      lightgraph      GraphConv  TrialRatio(90.70%)            missing 
   91000  adjlist     lightgraph      GraphConv  TrialRatio(98.73%)            missing 
  101000  dense       lightgraph      GraphConv  TrialRatio(109.35%)           missing 
  111000  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
  121000  sparse      lightgraph      GraphConv  TrialRatio(132.64%)           missing 
  1310000  adjlist     lightgraph      GraphConv  TrialRatio(99.90%)            missing 
  1410000  dense       lightgraph      GraphConv  TrialRatio(279.31%)           missing 
  1510000  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
  1610000  sparse      lightgraph      GraphConv  TrialRatio(228.48%)           missing 

@CarloLucibello
Copy link
Member Author

It would be nice to get this in quickly to help with other PRs

@yuehhua
Copy link
Member

yuehhua commented Aug 3, 2021

Thank you for your work for benchmarking graph convolutional layers. Not only the scale of graph is important, but also the edge number does matter. A graph usually contains less edges. For example, there are N nodes in a graph and the total possible edges are E = N(N+1)/2. Maybe a graph has 0.5 E edges, even less to 0.1 E edges. For a benchmark script, I think it is good to have kwargs for edge_ratio to set the proportion of randomly connected edges.

GraphConv is more than 10x slower than GCNConv, at least at smaller sizes.

Actually, I have no much idea about this.

Since GCNConv calls adjacency_matrix, that's a problem (for the other graph types) that should be easily fixed by providing better specialized methods.

Either adjacency matrix or other format, it should be transformed into normalized Laplacian matrix. The specialized method can be support of normalized Laplacian matrix for each format.

@CarloLucibello
Copy link
Member Author

For example, there are N nodes in a graph and the total possible edges are E = N(N+1)/2. Maybe a graph has 0.5 E edges, even less to 0.1 E edges. For a benchmark script, I think it is good to have kwargs for edge_ratio to set the proportion of randomly connected edges.

Most natural networks are better modeled by an average degree which is constant or scales very weakly with N, having O(N) degree would produce unrealistically dense graphs. I added the degree as a keyword argument to run_benchmarks so that one can experiment with different values.

perf/perf.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member Author

Benchmarking on erdos-renyi exposed another bug: GraphConv fails when the graph contains isolated nodes

@CarloLucibello
Copy link
Member Author

Can we merge this?

@yuehhua
Copy link
Member

yuehhua commented Aug 6, 2021

I would merge this first, but I think it is better to benchmark the conversion function laplacian_matrix and pure propagate function for each graph structures. Pure means there message function and update function returns directly. For example, message(x_i, x_j) = x_j and update(x, m) = m. These micro-benchmarks may represent the effort have to take to convert from original structure and the indexing performance for a structure.

@yuehhua yuehhua merged commit b86521d into FluxML:master Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants