Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #253

Draft
wants to merge 24 commits into
base: ale/3.0
Choose a base branch
from
Draft

Performance improvements #253

wants to merge 24 commits into from

Conversation

gkronber
Copy link
Collaborator

@gkronber gkronber commented Oct 10, 2024

Follow up to #239.

The MT code deviates from most recent egg code slightly since the changes from egraphs-good/egg#291 which mentiones "There are also memory usage/performance advantages since EClass.parents, EGraph.pending, and EGraph.analysis_pending, no longer need to store nodes."

This PR is exploratory. We should split things up in smaller self-contained PRs later.

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.10619% with 27 lines in your changes missing coverage. Please review.

Project coverage is 78.68%. Comparing base (6814104) to head (0d203a8).

Files with missing lines Patch % Lines
src/EGraphs/egraph.jl 71.42% 26 Missing ⚠️
src/optbuffer.jl 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           ale/3.0     #253      +/-   ##
===========================================
- Coverage    81.12%   78.68%   -2.44%     
===========================================
  Files           18       18              
  Lines         1499     1539      +40     
===========================================
- Hits          1216     1211       -5     
- Misses         283      328      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Oct 10, 2024

Benchmark Results

egg-sym egg-cust MT@0d203a82bab... MT@6814104fbe1... egg-sym/MT@0d2... egg-cust/MT@0d... MT@6814104fbe1...
egraph_addexpr 1.47 ms 4.79 ms 5.32 ms 0.308 1.11
basic_maths_simpl2 13.7 ms 5.67 ms 22.7 ms 20.9 ms 0.604 0.25 0.922
prop_logic_freges_theorem 2.58 ms 1.56 ms 1.08 ms 1.05 ms 2.4 1.45 0.977
calc_logic_demorgan 60.8 μs 35.6 μs 73.9 μs 76.2 μs 0.823 0.482 1.03
calc_logic_freges_theorem 23.9 ms 11.5 ms 51.1 ms 43.9 ms 0.468 0.226 0.859
basic_maths_simpl1 6.45 ms 3.12 ms 5.25 ms 4.74 ms 1.23 0.594 0.904
egraph_constructor 0.0858 μs 0.0833 μs 0.0937 μs 1.03 1.12
prop_logic_prove1 36.8 ms 13.7 ms 49.7 ms 43.5 ms 0.741 0.276 0.876
prop_logic_demorgan 81.9 μs 45.9 μs 88.7 μs 94.5 μs 0.923 0.518 1.07
while_superinterpreter_while_10 31 ms 18.5 ms 0.595
prop_logic_rewrite 122 μs 121 μs 0.998
time_to_load 117 ms 119 ms 1.02

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@@ -119,13 +115,15 @@ mutable struct EGraph{ExpressionType,Analysis}
uf::UnionFind
"map from eclass id to eclasses"
classes::Dict{IdKey,EClass{Analysis}}
"vector of the original e-nodes"
nodes::Vector{VecExpr}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if later on we can figure out something more efficient than vectors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this the vector is quite natural.
New enodes are only pushed at the end, and we can simply index enodes by their non-canonical index.

Comment on lines 247 to 251
vec = get(g.classes_by_op, key, nothing)
if isnothing(vec)
vec = Id[eclass_id]
g.classes_by_op[key] = vec
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not vec = get!(g.classes_by_op, key, Id[eclass_id])?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that when Id[eclass_id] is supplied as an argument we always create a new vector that we immediately discard when it already exists.
Probably it would be best to use

get!(Vector{Id}, g.classes_by_op, key)

(node::VecExpr, eclass_id::Id) = pop!(g.pending)
node = copy(node)
enode_id = pop!(g.pending)
node = copy(g.nodes[enode_id])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the copy necessary here?

Copy link
Collaborator Author

@gkronber gkronber Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. This will still undergo some changes.

@@ -42,7 +42,7 @@ they represent. The [`EGraph`](@ref) itself comes with pretty printing for human
struct EClass{D}
id::Id
nodes::Vector{VecExpr}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also have this be a vector of Id, if we now store VecExprs in nodes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be better. I tested this already, and reverted it again because it introduces an awkward indirection of g.nodes[class.nodes[i]] in several places, most notably the ematching code. Local test showed no performance improvements of only storing the ids here. I'm not yet decided.

My current approach was to store the same nodes objects (VecExpr) in the nodes vectors of eclasses and the egraph. As far as I understand, we do not need to keep the original (uncanonicalized) enodes.

Id[],
UniqueQueue{Id}(),
Vector{Id}(),
Vector{Id}(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no more UniqueQueue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore.

Reasoning:

  • We need the nodes and eclass ids in the pending / analysis_pending vectors for rebuilding.
  • Nodes may be canonicalized (while they are contained in the pending collections)
  • UniqueQueue detects duplicates only when adding new entries, and it would not remove duplicates of nodes after canonicalization.
  • the rewrite of the rebuilding code removes duplicates in the pending and analysis_pending vectors after each iteration instead.

@@ -297,14 +295,15 @@ function preprocess(e::Expr)
end
preprocess(x) = x

addexpr!(::EGraph, se::EClass) = se.id # TODO: why do we need this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for dynamic rules f(a,b) => a. This will also work when a is an EClass

@gkronber
Copy link
Collaborator Author

@0x0f0f0f this PR is more exploratory, not yet ready for review. I'll prepare a new clean PR that contains only the necessary changes. However, it would be good if we can base this new PR on #249 which should be merged first. Could we please try to prepare #249 to merge it first, then we can continue with this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants