Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typical graphs to test on? #411

Open
gdalle opened this issue Nov 21, 2024 · 3 comments
Open

Typical graphs to test on? #411

gdalle opened this issue Nov 21, 2024 · 3 comments

Comments

@gdalle
Copy link
Member

gdalle commented Nov 21, 2024

Are there any standard sets of benchmark graphs on the web that we could use for testing? We would want them to be

  • diverse: cover several structures and densities
  • scalable: go from small to large
  • easy to generate or download
@simonschoelly
Copy link
Member

I wanted to write some guidelines on benchmarking for a very long time but have not gotten around it. But let me write down some of my unstructured thoughts here:

  • Graphs for benchmarking need to depend on the problem:
    • The size should be adapted to the problem. Some NP-hard problem should probably only run a graph with 100 vertices while the typical traversal based algorithms should run on graphs on the order of 10^5 to 10^7 vertices.
    • The structure needs to be adapted to the problem. For example it does not make sense to run an algorithm that computes the transitive closure on a strongly connected graph.
    • Parameters that might depend on the problem are
      • Density (i.e.number of edges)
      • Degree and diameter for traversal based algorithms
      • The number of strong/week components
      • Regularity and symmetry for problems related to isomorphism
    • Random graphs, especially the Erdos-Renyi graph can be used for benchmarking but should be the only ones used as they usually have very concentrated parameters as the number of parameters increase.
    • Good graphs often come from geometry or social networks.
    • Some graphs ideal for benchmarking can be found in https://github.com/JuliaGraphs/SNAPDatasets.jl.
    • Small graphs with some specific properties can be found at https://houseofgraphs.org/. One can also use the program called geng that is included here to generate them. Also someone created an online version
  • If someone runs benchmarks for PR they should put the code as comment into that PR so that reviewer can get an idea if the benchmark makes sense.
  • It is also important to specify the environment used to run that benchmark:
    • Which -O flag is used to run to start Julia. Default is -O(2) but we might wanna run benchmarks with-O(3).
    • For parallel problems the number of physical/logical cores and how many Julia is allowed to use.
    • If you run the benchmark on a very weak laptop, please also specify how much RAM you have.
  • The key to good benchmarks is to reduce the amount of noise (variance).
    • Use BenchmarkTools.jl or Chairmarks.jl to run the problem multiple times.
    • Report not only average but also minimum/mean/max time - especially minimum and mean tend to have much less variance than average and max.
    • If your problem makes use of caching we might need a method to either clear the cache manually or we should report the time of the first and second run. But we need to be sure that we don't measure the compilation time here.

@jd-foster
Copy link
Contributor

Some here? GunnarFarneback/FeedbackArcSets.jl#2

@gdalle
Copy link
Member Author

gdalle commented Nov 26, 2024

I was thinking of mentoring a GSoC project on graph storage, downloads and benchmark sets. How would that sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants