Typical graphs to test on? #411

gdalle · 2024-11-21T11:10:07Z

Are there any standard sets of benchmark graphs on the web that we could use for testing? We would want them to be

diverse: cover several structures and densities
scalable: go from small to large
easy to generate or download

simonschoelly · 2024-11-24T17:16:53Z

I wanted to write some guidelines on benchmarking for a very long time but have not gotten around it. But let me write down some of my unstructured thoughts here:

Graphs for benchmarking need to depend on the problem:
- The size should be adapted to the problem. Some NP-hard problem should probably only run a graph with 100 vertices while the typical traversal based algorithms should run on graphs on the order of 10^5 to 10^7 vertices.
- The structure needs to be adapted to the problem. For example it does not make sense to run an algorithm that computes the transitive closure on a strongly connected graph.
- Parameters that might depend on the problem are
  - Density (i.e.number of edges)
  - Degree and diameter for traversal based algorithms
  - The number of strong/week components
  - Regularity and symmetry for problems related to isomorphism
- Random graphs, especially the Erdos-Renyi graph can be used for benchmarking but should be the only ones used as they usually have very concentrated parameters as the number of parameters increase.
- Good graphs often come from geometry or social networks.
- Some graphs ideal for benchmarking can be found in https://github.com/JuliaGraphs/SNAPDatasets.jl.
- Small graphs with some specific properties can be found at https://houseofgraphs.org/. One can also use the program called geng that is included here to generate them. Also someone created an online version
If someone runs benchmarks for PR they should put the code as comment into that PR so that reviewer can get an idea if the benchmark makes sense.
It is also important to specify the environment used to run that benchmark:
- Which -O flag is used to run to start Julia. Default is -O(2) but we might wanna run benchmarks with-O(3).
- For parallel problems the number of physical/logical cores and how many Julia is allowed to use.
- If you run the benchmark on a very weak laptop, please also specify how much RAM you have.
The key to good benchmarks is to reduce the amount of noise (variance).
- Use BenchmarkTools.jl or Chairmarks.jl to run the problem multiple times.
- Report not only average but also minimum/mean/max time - especially minimum and mean tend to have much less variance than average and max.
- If your problem makes use of caching we might need a method to either clear the cache manually or we should report the time of the first and second run. But we need to be sure that we don't measure the compilation time here.

jd-foster · 2024-11-26T07:33:54Z

Some here? GunnarFarneback/FeedbackArcSets.jl#2

gdalle · 2024-11-26T08:33:32Z

I was thinking of mentoring a GSoC project on graph storage, downloads and benchmark sets. How would that sound?

gdalle mentioned this issue Nov 21, 2024

Count number of connected components more efficiently than length(connected_components(g)) #407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typical graphs to test on? #411

Typical graphs to test on? #411

gdalle commented Nov 21, 2024

simonschoelly commented Nov 24, 2024

jd-foster commented Nov 26, 2024

gdalle commented Nov 26, 2024

Typical graphs to test on? #411

Typical graphs to test on? #411

Comments

gdalle commented Nov 21, 2024

simonschoelly commented Nov 24, 2024

jd-foster commented Nov 26, 2024

gdalle commented Nov 26, 2024