Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nodes field to EGraph #291

Merged
merged 7 commits into from
Apr 3, 2024
Merged

Conversation

dewert99
Copy link
Contributor

@dewert99 dewert99 commented Jan 3, 2024

I decided to try implementing the changes I described in #290 for adding a nodes field to EGraph. The main downside to my changes is that EClass::parents only returns non-canonical Ids and not the nodes themselves, although the nodes are available via EGraph::id_to_node. The main user-facing advantage is that EGraph::id_to_X methods as well as EGraph::copy_without_unions do not require explanations to be enabled. There are also memory usage/performance advantages since EClass.parents, EGraph.pending, and EGraph.analysis_pending, no longer need to store nodes. The nodes passed to Analysis::make are slightly different than before, but they are the same after canonicalization.

I also eliminated the node field of ExplainNode to reduce memory usage when explanations are enabled, but this did add some complexity. If this complexity isn't considered to be worth it, I could also revert the changes to explain.rs while preserving the rest of the changes.

@dewert99 dewert99 marked this pull request as draft January 4, 2024 02:56
@dewert99
Copy link
Contributor Author

dewert99 commented Jan 4, 2024

I noticed that since EGraph::add_uncanonical is optimized to avoid creating new nodes when an equivalent node is available, EGraph::copy_without_unions is incorrect when explanations aren't enabled, and the EGraph::id_to_X methods need to be clarified for the case where explanations are not enabled.

@dewert99 dewert99 marked this pull request as ready for review January 4, 2024 16:40
@dewert99
Copy link
Contributor Author

dewert99 commented Jan 8, 2024

In order to get more stable benchmarks I tried using a modified version of iai-callgrind:
egraph_nodes-vs-main.txt

I also compared main with itself to get a sense of stability:

main-vs-main.txt

@mwillsey mwillsey requested a review from oflatt January 8, 2024 23:25
@mwillsey
Copy link
Member

mwillsey commented Jan 8, 2024

I like the idea here. @oflatt will know best about the interactions with proof stuff. Should we worry if the nodes Vec will grow too large (without proofs, in a long running example)? I guess we never compact the unionfind either, so at worst linear overhead I think.

@dewert99
Copy link
Contributor Author

dewert99 commented Jan 8, 2024

Currently, this decreases memory usage even when not using explanations (at least in the benchmarks I ran) since the memory used by nodes is offset by not storing nodes in the parents list. This may change if we start compacting the parents lists (#113), but I think you're right about the linear overhead and I don't think the unionfind could be compacted. I guess the only problematic case is if the users' Language implementation is very large, they don't use explanations, and we do implement parent list compaction.

@mwillsey
Copy link
Member

mwillsey commented Jan 8, 2024

This all kinds of begs the question: why not only store the e-nodes a single time? E-classes could refer to their nodes by id, and the memo could probably even get replaced with an IndexMap.

@dewert99
Copy link
Contributor Author

dewert99 commented Jan 9, 2024

Replacing memo would be tricky since it uses canonicalized nodes and the EGraph nodes field uses un-canonicalized nodes. We could add a separate list of canonicalized nodes but I don't see how that would be any better keeping memo the way it is. The nodes field of EClass also gets canonicalized as part of re-building, although may not actually be necessary (just nice for deduplication). We would also need to experiment with the cost of the extra indirection when searching since currently the nodes field seems to be hit quite hard. Another idea I had that might help with this is to make classes_by_op into a HashMap<L::Discriminant, HashMap<Id, u32>> that indicates the first element of a class's nodes field that has a particular discriminant so we don't need to search from the beginning or binary search.

@dewert99
Copy link
Contributor Author

dewert99 commented Jan 9, 2024

Another idea would be to combine EGraph.nodes and Explain.uncanon_memo into an IndexSet<L>, this would reduce the number of copies of nodes being use when explanations are enabled but add the overhead of the RawTable when explanations are not enabled

@dewert99 dewert99 mentioned this pull request Jan 12, 2024
@dewert99
Copy link
Contributor Author

I was playing a bit with the idea of combining EGraph.nodes and Explain.uncanon_memo into an IndexSet<L> and I noticed that interestingly it is possible to have two ids that correspond to the exact same node (via id_to_node).

A somewhat minimal example of this is

#[test]
fn dup_node() {
    use SymbolLang as S;

    crate::init_logger();
    let mut egraph = EGraph::<S, ()>::default().with_explanations_enabled();

    let y = egraph.add_uncanonical(S::leaf("y"));
    let y2 = egraph.add_uncanonical(S::leaf("y2"));
    egraph.add_uncanonical(S::new("f", vec![y]));
    egraph.add_uncanonical(S::new("g", vec![y2]));
    egraph.union(y, y2);
    egraph.add_uncanonical(S::new("f", vec![y]));
    egraph.add_uncanonical(S::new("g", vec![y2])); // adds the duplicate
    for i in 0..egraph.total_number_of_nodes() {
        let id = Id::from(i);
        println!("{id}: {:?} (root: {})", egraph.id_to_node(id), egraph.find(id))
    }
}

which prints:

0: SymbolLang { op: "y", children: [] } (root: 0)
1: SymbolLang { op: "y2", children: [] } (root: 0)
2: SymbolLang { op: "f", children: [0] } (root: 2)
3: SymbolLang { op: "g", children: [1] } (root: 3)
4: SymbolLang { op: "g", children: [1] } (root: 4)

This works before and after this PR (and after this PR it also works when explanations aren't enabled)

Basically what happens is that when calling lookup_internal inside of the second call to egraph.add_uncanonical(S::new("g", vec![y2])), y2 gets canonized to y, but since the entry in memo hasn't be updated yet it is still S::new("g", vec![y2]) so lookup_internal returns None which forces a call to make_new_eclass which adds the duplicate node.

This isn't a bug since calling rebuild causes the duplicate nodes to get unioned.

This does give the idea of combining EGraph.nodes and Explain.uncanon_memo into an IndexSet<L> an additional benefit that we can use it to double check that we don't add any duplicate nodes.

@dewert99 dewert99 force-pushed the egraph_nodes branch 3 times, most recently from eb5b548 to 4d4c52d Compare March 21, 2024 22:17
@dewert99
Copy link
Contributor Author

@mwillsey, I noticed you recently merged #306, so I updated PR to handle EGraph mapping, it actually simplifies it somewhat. Is this PR (as well as #296 and #300) still things you are interested in? I could also try to update #296 and #300 to handle egraph mapping if you would be interested in merging them.

Copy link
Member

@oflatt oflatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work! I'm in favor of this change. It cleans up the implementation nicely.
Did you ever get performance numbers? It's probably as performant as before.
Sorry I didn't review this sooner, and great job on this high-quality PR.

for i in 0..self.explainfind.len() {
let node = &self.explainfind[i];
egraph.add(node.node.clone());
pub(crate) fn with_nodes<'a>(&'a mut self, nodes: &'a [L]) -> ExplainNodes<'a, L> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_nodes is a little awkward.
Another way to implement would be to keep the nodes in the Explain struct from the start. I think that might be cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nodes need to be available when explanations are disabled so I couldn't only have them in the Explain struct like they were before, I didn't want to duplicate all of the nodes by having them in both places since that would require extra unnecessarily memory, and since rust doesn't support self-referential structs, I thought this was the easiest way to not change the explain implementation to much.

Copy link
Member

@oflatt oflatt Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps the Explain struct should be around whether explanations are disabled or enabled? Or I suppose the other fields of Explain could be wrapped in an Option<OtherExplanationFields>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean have

struct Explain {
  nodes: Vec<L>,
  rest: Option<OtherExplanationFields>,
}

I don't think this would simplify things very much, and the nodes are use outside of explanations as well so I don't see why they should be part of the explain struct.
Some of the explain methods also take in the unionfind and classes, so I could also thread the nodes as an extra parameter if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean.
I don't have strong feelings, the solution you have is not too bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're fine with it, I think I'll just leave it as it is. Is there anything you would like me to change before I merge this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, everything looks good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, Thanks

src/egraph.rs Show resolved Hide resolved
@@ -1324,14 +1400,15 @@ impl<L: Language, N: Analysis<L>> EGraph<L, N> {
}
}

while let Some((node, class_id)) = self.analysis_pending.pop() {
while let Some(class_id) = self.analysis_pending.pop() {
let node = self.nodes[usize::from(class_id)].clone();
let class_id = self.find_mut(class_id);
let node_data = N::make(self, &node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that make can be called with un-canonical e-class ids? If so we should at least say so in the make docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that make can be called with un-canonical e-class ids?

Yes, but I think this was also true before this PR, I don't think we ever canonized the nodes in analysis_pending, or parents.

If so we should at least say so in the make docs.

If you want I can still add that

@mwillsey
Copy link
Member

mwillsey commented Apr 3, 2024 via email

@mwillsey mwillsey merged commit 3231b86 into egraphs-good:main Apr 3, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants