Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Drops duplicate edges in non-MultiGraph PLC
SGGraph
instances (#4658)
Graph input with duplicate edges intended for `Graph`/`DiGraph` instances resulted in internal PLC `SGGraph` instances with duplicate edges, which were effectively treated as MultiGraphs and caused incorrect results from algorithms like `pagerank`. This PR sets the `drop_multi_edges` PLC `SGGraph` ctor option to have PLC remove duplicate edges on `SGGraph` creation. The overhead to drop duplicate edges for non-MultiGraphs is negligible, and in the case of a large test graph (wikipedia data, 37.5M nodes, 464.5M edges) resulted in <s>an overall _speedup_ for pagerank going from 12.2 seconds to 10.7 seconds on my workstation, likely due to fewer edges to process</s> a minor slowdown from 10.5s to 10.7s. _edit: after several re-runs, the pagerank runtime before the change settled to 10.5, and the runtime after the change was typically 10.7._ A test was added that uses pagerank to ensure Graphs vs. MultiGraphs are handled correctly and duplicate edges are dropped as needed. The results when run without `drop_multi_edges` set: ``` > assert actual_pr_for_G == approx(expected_pr_for_G) E assert {0: 0.0875795...7955580949783} == approx({0: 0....32 ± 1.8e-07}) E E comparison failed. Mismatched elements: 4 / 4: E Max absolute difference: 0.08785887916592061 E Max relative difference: 0.5007959662968462 E Index | Obtained | Expected E 0 | 0.08757955580949783 | 0.17543839772251532 ± 1.8e-07 E 1 | 0.41242048144340515 | 0.32456160227748454 ± 3.2e-07 E 2 | 0.41242048144340515 | 0.32456160227748454 ± 3.2e-07 E 3 | 0.08757955580949783 | 0.17543839772251532 ± 1.8e-07 ``` The same test passes when run with the changes in this PR to set `drop_multi_edges`. Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - Erik Welch (https://github.com/eriknw) URL: #4658
- Loading branch information