Performance improvements #253

gkronber · 2024-10-10T13:36:46Z

Follow up to #239.

The MT code deviates from most recent egg code slightly since the changes from egraphs-good/egg#291 which mentiones "There are also memory usage/performance advantages since EClass.parents, EGraph.pending, and EGraph.analysis_pending, no longer need to store nodes."

This PR is exploratory. We should split things up in smaller self-contained PRs later.

…ists.

…new vector for each lookup

codecov-commenter · 2024-10-10T13:44:56Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.10619% with 27 lines in your changes missing coverage. Please review.

Project coverage is 78.68%. Comparing base (6814104) to head (0d203a8).

Files with missing lines	Patch %	Lines
src/EGraphs/egraph.jl	71.42%	26 Missing ⚠️
src/optbuffer.jl	0.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           ale/3.0     #253      +/-   ##
===========================================
- Coverage    81.12%   78.68%   -2.44%     
===========================================
  Files           18       18              
  Lines         1499     1539      +40     
===========================================
- Hits          1216     1211       -5     
- Misses         283      328      +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-10-10T13:48:19Z

Benchmark Results

	egg-sym	egg-cust	MT@0d203a82bab...	MT@6814104fbe1...	egg-sym/MT@0d2...	egg-cust/MT@0d...	MT@6814104fbe1...
egraph_addexpr	1.47 ms		4.79 ms	5.32 ms	0.308		1.11
basic_maths_simpl2	13.7 ms	5.67 ms	22.7 ms	20.9 ms	0.604	0.25	0.922
prop_logic_freges_theorem	2.58 ms	1.56 ms	1.08 ms	1.05 ms	2.4	1.45	0.977
calc_logic_demorgan	60.8 μs	35.6 μs	73.9 μs	76.2 μs	0.823	0.482	1.03
calc_logic_freges_theorem	23.9 ms	11.5 ms	51.1 ms	43.9 ms	0.468	0.226	0.859
basic_maths_simpl1	6.45 ms	3.12 ms	5.25 ms	4.74 ms	1.23	0.594	0.904
egraph_constructor	0.0858 μs		0.0833 μs	0.0937 μs	1.03		1.12
prop_logic_prove1	36.8 ms	13.7 ms	49.7 ms	43.5 ms	0.741	0.276	0.876
prop_logic_demorgan	81.9 μs	45.9 μs	88.7 μs	94.5 μs	0.923	0.518	1.07
while_superinterpreter_while_10			31 ms	18.5 ms			0.595
prop_logic_rewrite			122 μs	121 μs			0.998
time_to_load			117 ms	119 ms			1.02

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

…d debug output).

0x0f0f0f · 2024-10-11T10:51:57Z

src/EGraphs/egraph.jl

@@ -119,13 +115,15 @@ mutable struct EGraph{ExpressionType,Analysis}
  uf::UnionFind
  "map from eclass id to eclasses"
  classes::Dict{IdKey,EClass{Analysis}}
+  "vector of the original e-nodes"
+  nodes::Vector{VecExpr}


I wonder if later on we can figure out something more efficient than vectors

For this the vector is quite natural.
New enodes are only pushed at the end, and we can simply index enodes by their non-canonical index.

0x0f0f0f · 2024-10-11T10:53:39Z

src/EGraphs/egraph.jl

+  vec = get(g.classes_by_op, key, nothing)
+  if isnothing(vec)
+    vec = Id[eclass_id]
+    g.classes_by_op[key] = vec
+  end


Why not vec = get!(g.classes_by_op, key, Id[eclass_id])?

The idea was that when Id[eclass_id] is supplied as an argument we always create a new vector that we immediately discard when it already exists.
Probably it would be best to use

get!(Vector{Id}, g.classes_by_op, key)

0x0f0f0f · 2024-10-11T10:55:00Z

src/EGraphs/egraph.jl

-      (node::VecExpr, eclass_id::Id) = pop!(g.pending)
-      node = copy(node)
+      enode_id = pop!(g.pending)
+      node = copy(g.nodes[enode_id])


Is the copy necessary here?

Probably not. This will still undergo some changes.

olynch · 2024-10-11T12:43:33Z

src/EGraphs/egraph.jl

@@ -42,7 +42,7 @@ they represent. The [`EGraph`](@ref) itself comes with pretty printing for human
 struct EClass{D}
  id::Id
  nodes::Vector{VecExpr}


Why not also have this be a vector of Id, if we now store VecExprs in nodes?

Yes, this could be better. I tested this already, and reverted it again because it introduces an awkward indirection of g.nodes[class.nodes[i]] in several places, most notably the ematching code. Local test showed no performance improvements of only storing the ids here. I'm not yet decided.

My current approach was to store the same nodes objects (VecExpr) in the nodes vectors of eclasses and the egraph. As far as I understand, we do not need to keep the original (uncanonicalized) enodes.

…rebuilding.

…broken tests.

0x0f0f0f · 2024-10-22T11:46:26Z

src/EGraphs/egraph.jl

-    Id[],
-    UniqueQueue{Id}(),
+    Vector{Id}(),
+    Vector{Id}(),


Why no more UniqueQueue?

Not needed anymore.

Reasoning:

We need the nodes and eclass ids in the pending / analysis_pending vectors for rebuilding.

Nodes may be canonicalized (while they are contained in the pending collections)

UniqueQueue detects duplicates only when adding new entries, and it would not remove duplicates of nodes after canonicalization.

the rewrite of the rebuilding code removes duplicates in the pending and analysis_pending vectors after each iteration instead.

0x0f0f0f · 2024-10-22T11:47:56Z

src/EGraphs/egraph.jl

@@ -297,14 +295,15 @@ function preprocess(e::Expr)
 end
 preprocess(x) = x

+addexpr!(::EGraph, se::EClass) = se.id # TODO: why do we need this?


It's for dynamic rules f(a,b) => a. This will also work when a is an EClass

gkronber · 2024-10-22T12:06:38Z

@0x0f0f0f this PR is more exploratory, not yet ready for review. I'll prepare a new clean PR that contains only the necessary changes. However, it would be good if we can base this new PR on #249 which should be merged first. Could we please try to prepare #249 to merge it first, then we can continue with this branch.

gkronber added 6 commits October 10, 2024 14:28

Output parent lists and check parents after rebuilding

2e1e46d

Path splitting procedure to shorten path length with find call.

1301422

Merge branch 'fix_broken_cas_tests' into performance_improvements

0704eb1

Store original e-nodes in egraph and keep only e-node ids in parent l…

500e25e

…ists.

Revert changes to pretty_dict output.

ac4829c

Change lookup in classes_by_op dictionary to prevent allocation of a …

6341b3b

…new vector for each lookup

gkronber added 3 commits October 10, 2024 16:45

Fixed implementation of iterate for optbuffer (currently only affecte…

9fb097c

…d debug output).

Fix compile error.

80a4696

Find of eclass_id for enode_id is not necessary here

68d40d1

0x0f0f0f reviewed Oct 11, 2024

View reviewed changes

olynch reviewed Oct 11, 2024

View reviewed changes

gkronber mentioned this pull request Oct 11, 2024

Don't copy if should_copy is false #251

Closed

gkronber added 14 commits October 12, 2024 12:28

Add some test assertions for internal datastructures used for egraph …

013253e

…rebuilding.

Merge branch 'ale/3.0' into performance_improvements

4e6bc9f

Set root to allow debugging (requires extraction)

9db374d

isless for VecExpr to allow sorting.

3ac0718

Check most specific constants first.

90719d0

Comment and removed unnecessary parentheses.

6ddffa4

Fixes for constant matching from different PR

3c07d51

Allow to set SaturationParams for simplify for testing, and mark two …

97c272b

…broken tests.

Small fixes.

917f3fe

Correct test cases for rebuilding

e6f582c

Complete overhaul of rebuilding mechanism.

50d6dfd

Fixes, moving forward...

ee1862e

Bugfix in analysis rebuilding.

348dd62

Minor changes.

4d26e3c

Remove nodes vector from egraph and clean-up code.

0d203a8

0x0f0f0f reviewed Oct 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #253

Performance improvements #253

gkronber commented Oct 10, 2024 •

edited

Loading

codecov-commenter commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024 •

edited

Loading

0x0f0f0f Oct 11, 2024

gkronber Oct 11, 2024

0x0f0f0f Oct 11, 2024

gkronber Oct 11, 2024

0x0f0f0f Oct 11, 2024

gkronber Oct 11, 2024 •

edited

Loading

olynch Oct 11, 2024

gkronber Oct 11, 2024

0x0f0f0f Oct 22, 2024

gkronber Oct 22, 2024

0x0f0f0f Oct 22, 2024

gkronber commented Oct 22, 2024

Performance improvements #253

Are you sure you want to change the base?

Performance improvements #253

Conversation

gkronber commented Oct 10, 2024 • edited Loading

codecov-commenter commented Oct 10, 2024 • edited Loading

Codecov Report

github-actions bot commented Oct 10, 2024 • edited Loading

Benchmark Results

Benchmark Plots

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkronber Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkronber commented Oct 22, 2024

gkronber commented Oct 10, 2024 •

edited

Loading

codecov-commenter commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024 •

edited

Loading

gkronber Oct 11, 2024 •

edited

Loading