Add nodes field to EGraph #291

dewert99 · 2024-01-03T20:12:16Z

I decided to try implementing the changes I described in #290 for adding a nodes field to EGraph. The main downside to my changes is that EClass::parents only returns non-canonical Ids and not the nodes themselves, although the nodes are available via EGraph::id_to_node. The main user-facing advantage is that EGraph::id_to_X methods as well as EGraph::copy_without_unions do not require explanations to be enabled. There are also memory usage/performance advantages since EClass.parents, EGraph.pending, and EGraph.analysis_pending, no longer need to store nodes. The nodes passed to Analysis::make are slightly different than before, but they are the same after canonicalization.

I also eliminated the node field of ExplainNode to reduce memory usage when explanations are enabled, but this did add some complexity. If this complexity isn't considered to be worth it, I could also revert the changes to explain.rs while preserving the rest of the changes.

…and `analysis_pending`

dewert99 · 2024-01-04T03:00:14Z

I noticed that since EGraph::add_uncanonical is optimized to avoid creating new nodes when an equivalent node is available, EGraph::copy_without_unions is incorrect when explanations aren't enabled, and the EGraph::id_to_X methods need to be clarified for the case where explanations are not enabled.

… are disabled

dewert99 · 2024-01-08T20:11:44Z

In order to get more stable benchmarks I tried using a modified version of iai-callgrind:
egraph_nodes-vs-main.txt

I also compared main with itself to get a sense of stability:

main-vs-main.txt

mwillsey · 2024-01-08T23:28:21Z

I like the idea here. @oflatt will know best about the interactions with proof stuff. Should we worry if the nodes Vec will grow too large (without proofs, in a long running example)? I guess we never compact the unionfind either, so at worst linear overhead I think.

dewert99 · 2024-01-08T23:42:53Z

Currently, this decreases memory usage even when not using explanations (at least in the benchmarks I ran) since the memory used by nodes is offset by not storing nodes in the parents list. This may change if we start compacting the parents lists (#113), but I think you're right about the linear overhead and I don't think the unionfind could be compacted. I guess the only problematic case is if the users' Language implementation is very large, they don't use explanations, and we do implement parent list compaction.

mwillsey · 2024-01-08T23:48:11Z

This all kinds of begs the question: why not only store the e-nodes a single time? E-classes could refer to their nodes by id, and the memo could probably even get replaced with an IndexMap.

dewert99 · 2024-01-09T00:15:20Z

Replacing memo would be tricky since it uses canonicalized nodes and the EGraph nodes field uses un-canonicalized nodes. We could add a separate list of canonicalized nodes but I don't see how that would be any better keeping memo the way it is. The nodes field of EClass also gets canonicalized as part of re-building, although may not actually be necessary (just nice for deduplication). We would also need to experiment with the cost of the extra indirection when searching since currently the nodes field seems to be hit quite hard. Another idea I had that might help with this is to make classes_by_op into a HashMap<L::Discriminant, HashMap<Id, u32>> that indicates the first element of a class's nodes field that has a particular discriminant so we don't need to search from the beginning or binary search.

dewert99 · 2024-01-09T00:29:20Z

Another idea would be to combine EGraph.nodes and Explain.uncanon_memo into an IndexSet<L>, this would reduce the number of copies of nodes being use when explanations are enabled but add the overhead of the RawTable when explanations are not enabled

dewert99 · 2024-01-26T05:27:53Z

I was playing a bit with the idea of combining EGraph.nodes and Explain.uncanon_memo into an IndexSet<L> and I noticed that interestingly it is possible to have two ids that correspond to the exact same node (via id_to_node).

A somewhat minimal example of this is

#[test]
fn dup_node() {
    use SymbolLang as S;

    crate::init_logger();
    let mut egraph = EGraph::<S, ()>::default().with_explanations_enabled();

    let y = egraph.add_uncanonical(S::leaf("y"));
    let y2 = egraph.add_uncanonical(S::leaf("y2"));
    egraph.add_uncanonical(S::new("f", vec![y]));
    egraph.add_uncanonical(S::new("g", vec![y2]));
    egraph.union(y, y2);
    egraph.add_uncanonical(S::new("f", vec![y]));
    egraph.add_uncanonical(S::new("g", vec![y2])); // adds the duplicate
    for i in 0..egraph.total_number_of_nodes() {
        let id = Id::from(i);
        println!("{id}: {:?} (root: {})", egraph.id_to_node(id), egraph.find(id))
    }
}

which prints:

0: SymbolLang { op: "y", children: [] } (root: 0)
1: SymbolLang { op: "y2", children: [] } (root: 0)
2: SymbolLang { op: "f", children: [0] } (root: 2)
3: SymbolLang { op: "g", children: [1] } (root: 3)
4: SymbolLang { op: "g", children: [1] } (root: 4)

This works before and after this PR (and after this PR it also works when explanations aren't enabled)

Basically what happens is that when calling lookup_internal inside of the second call to egraph.add_uncanonical(S::new("g", vec![y2])), y2 gets canonized to y, but since the entry in memo hasn't be updated yet it is still S::new("g", vec![y2]) so lookup_internal returns None which forces a call to make_new_eclass which adds the duplicate node.

This isn't a bug since calling rebuild causes the duplicate nodes to get unioned.

This does give the idea of combining EGraph.nodes and Explain.uncanon_memo into an IndexSet<L> an additional benefit that we can use it to double check that we don't add any duplicate nodes.

dewert99 · 2024-03-21T22:49:14Z

@mwillsey, I noticed you recently merged #306, so I updated PR to handle EGraph mapping, it actually simplifies it somewhat. Is this PR (as well as #296 and #300) still things you are interested in? I could also try to update #296 and #300 to handle egraph mapping if you would be interested in merging them.

oflatt

Really great work! I'm in favor of this change. It cleans up the implementation nicely.
Did you ever get performance numbers? It's probably as performant as before.
Sorry I didn't review this sooner, and great job on this high-quality PR.

oflatt · 2024-03-22T15:52:14Z

src/explain.rs

-        for i in 0..self.explainfind.len() {
-            let node = &self.explainfind[i];
-            egraph.add(node.node.clone());
+    pub(crate) fn with_nodes<'a>(&'a mut self, nodes: &'a [L]) -> ExplainNodes<'a, L> {


with_nodes is a little awkward.
Another way to implement would be to keep the nodes in the Explain struct from the start. I think that might be cleaner?

The nodes need to be available when explanations are disabled so I couldn't only have them in the Explain struct like they were before, I didn't want to duplicate all of the nodes by having them in both places since that would require extra unnecessarily memory, and since rust doesn't support self-referential structs, I thought this was the easiest way to not change the explain implementation to much.

Hmm, perhaps the Explain struct should be around whether explanations are disabled or enabled? Or I suppose the other fields of Explain could be wrapped in an Option<OtherExplanationFields>

Do you mean have

struct Explain { nodes: Vec<L>, rest: Option<OtherExplanationFields>, }

I don't think this would simplify things very much, and the nodes are use outside of explanations as well so I don't see why they should be part of the explain struct.
Some of the explain methods also take in the unionfind and classes, so I could also thread the nodes as an extra parameter if you prefer.

Yeah, I see what you mean.
I don't have strong feelings, the solution you have is not too bad

If you're fine with it, I think I'll just leave it as it is. Is there anything you would like me to change before I merge this?

Nope, everything looks good to me

Great, Thanks

src/egraph.rs

mwillsey · 2024-04-02T22:30:09Z

src/egraph.rs

@@ -1324,14 +1400,15 @@ impl<L: Language, N: Analysis<L>> EGraph<L, N> {
                }
            }

-            while let Some((node, class_id)) = self.analysis_pending.pop() {
+            while let Some(class_id) = self.analysis_pending.pop() {
+                let node = self.nodes[usize::from(class_id)].clone();
                let class_id = self.find_mut(class_id);
                let node_data = N::make(self, &node);


Does this mean that make can be called with un-canonical e-class ids? If so we should at least say so in the make docs.

Does this mean that make can be called with un-canonical e-class ids?

Yes, but I think this was also true before this PR, I don't think we ever canonized the nodes in analysis_pending, or parents.

If so we should at least say so in the make docs.

If you want I can still add that

mwillsey · 2024-04-03T01:41:48Z

Yes, please. Then I'm good to merge!

…

On Tue, Apr 2, 2024 at 5:18 PM David Ewert ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/egraph.rs <#291 (comment)>: > @@ -1324,14 +1400,15 @@ impl<L: Language, N: Analysis<L>> EGraph<L, N> { } } - while let Some((node, class_id)) = self.analysis_pending.pop() { + while let Some(class_id) = self.analysis_pending.pop() { + let node = self.nodes[usize::from(class_id)].clone(); let class_id = self.find_mut(class_id); let node_data = N::make(self, &node); Does this mean that make can be called with un-canonical e-class ids? Yes, but I think this was also true before this PR, I don't think we ever canonized the nodes in analysis_pending, or parents. If so we should at least say so in the make docs. If you want I can still add that — Reply to this email directly, view it on GitHub <#291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANTPTF4I56Y23MTFTLYGCTY3NDEPAVCNFSM6AAAAABBL53E5WVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNZVGI3TCMRTGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dewert99 added 4 commits January 3, 2024 10:21

Added nodes field to EGraph to avoid storing nodes in analysis …

1f838c6

…and `analysis_pending`

eliminated node field of ExplainNode (used EGraph.nodes instead)

3145a30

serde

c075cbf

serde

3187e36

dewert99 marked this pull request as draft January 4, 2024 02:56

Clarify id_to_expr and prevent copy_with_unions when explanations…

4d4c52d

… are disabled

dewert99 marked this pull request as ready for review January 4, 2024 16:40

mwillsey requested a review from oflatt January 8, 2024 23:25

dewert99 mentioned this pull request Jan 12, 2024

Generic analysis #293

Closed

dewert99 mentioned this pull request Feb 6, 2024

Extracted out RawEGraph type #296

Draft

dewert99 mentioned this pull request Mar 20, 2024

Egraph nodes dewert99/plat-egg#1

Merged

dewert99 force-pushed the egraph_nodes branch 3 times, most recently from eb5b548 to 4d4c52d Compare March 21, 2024 22:17

Merge remote-tracking branch 'origin-dewert/main' into egraph_nodes

66773fe

oflatt approved these changes Mar 22, 2024

View reviewed changes

mwillsey reviewed Apr 2, 2024

View reviewed changes

Added note that enode in make may not be canonical

959a98f

mwillsey merged commit 3231b86 into egraphs-good:main Apr 3, 2024
2 checks passed

gkronber mentioned this pull request Aug 31, 2024

Fix hashing and memoization of enodes (VecExpr) JuliaSymbolics/Metatheory.jl#239

Merged

gkronber mentioned this pull request Oct 10, 2024

Performance improvements JuliaSymbolics/Metatheory.jl#253

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nodes field to EGraph #291

Add nodes field to EGraph #291

dewert99 commented Jan 3, 2024

dewert99 commented Jan 4, 2024

dewert99 commented Jan 8, 2024

mwillsey commented Jan 8, 2024

dewert99 commented Jan 8, 2024

mwillsey commented Jan 8, 2024

dewert99 commented Jan 9, 2024 •

edited

Loading

dewert99 commented Jan 9, 2024

dewert99 commented Jan 26, 2024

dewert99 commented Mar 21, 2024

oflatt left a comment

oflatt Mar 22, 2024

dewert99 Mar 22, 2024

oflatt Mar 22, 2024 •

edited

Loading

dewert99 Mar 22, 2024

oflatt Mar 25, 2024

dewert99 Mar 25, 2024

oflatt Mar 25, 2024

dewert99 Mar 25, 2024

mwillsey Apr 2, 2024

dewert99 Apr 3, 2024

mwillsey commented Apr 3, 2024 via email

Add nodes field to EGraph #291

Add nodes field to EGraph #291

Conversation

dewert99 commented Jan 3, 2024

dewert99 commented Jan 4, 2024

dewert99 commented Jan 8, 2024

mwillsey commented Jan 8, 2024

dewert99 commented Jan 8, 2024

mwillsey commented Jan 8, 2024

dewert99 commented Jan 9, 2024 • edited Loading

dewert99 commented Jan 9, 2024

dewert99 commented Jan 26, 2024

dewert99 commented Mar 21, 2024

oflatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oflatt Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwillsey commented Apr 3, 2024 via email

dewert99 commented Jan 9, 2024 •

edited

Loading

oflatt Mar 22, 2024 •

edited

Loading