Improve UF implementation #798

bclement-ocp · 2023-09-08T08:27:22Z

The Union-Find module is implemented with a mapping from semantic values to their representative. This means that accessing the representative is an O(log n) operation, where n is the number of live semantic values (roughly, number of distinct terms).

Benchmarking a few problems reveals that Alt-Ergo can easily spend 5-10% of its time just looking for representatives, and on long-running problems where we discover many terms, this can get upwards of 20%. Using a more efficient implementation of the Union-Find structure could bring tangible performance benefits: for instance, a properly implemented persistent version of Tarjan's Union-Find data structure would bring down the time to access the representatives to (amortized) O(α(n)) i.e. almost constant time.

(Note that these numbers include both the term → semantic value and semantic value → representative mappings, and are thus slightly inflated — in reality, they are probably closer to 3-8% and 15%. An improved union-find representation would only affect the semantic value → representative mapping, not the term → semantic value mapping. However, the term → semantic value mapping is independent from the environment (it always maps t to X.make t, and may no longer be needed at all now that we have the global cache on X.make).

After trying such an implementation, there are a few caveats:

First, this cannot be done in UF only. To have an efficient persistent implementation, the best way is to use an imperative implementation with checkpointing. But checkpointing is not free, and doing checkpoints before each UF operation is slow. Ideally, we want the checkpoints to be in sync with the SAT checkpoints, which requires lifting the checkpoints to the Ccx and Theory modules.
Second, we need to be extra careful when merging classes due to AC(X) requirements. In fact, AC(X) forces the class representative to be the lowest element in the class for the appropriate order, which seems to be causing subtle issues in relation to congruence closure.

The text was updated successfully, but these errors were encountered:

bclement-ocp · 2023-10-06T13:40:12Z

I have spent some more time implementing a better version of this, but hit another blocker. The crux of it is that Alt-Ergo has many copies of the union-find data structure (up to 8 when using Tableaux-CDCL, although at most 5 are used at once), which is mostly not an issue (although it explains some weird behaviors I was seeing), except for one specific use.

This use is gamma_finite in Theory, which is used for case splits. The way Alt-Ergo uses case splits is as follows.

The theory maintains two environments: gamma, that only sees propagations and decisions from the SAT solver, and gamma_finite.
gamma_finite is not used by default, but sometimes the SAT decides to perform a case split. In this case, we replay all the propagations and decisions from the SAT that have been applied to gamma into gamma_finite, then we ask the theories to choose values for some of their variables, which are applied only to gamma_finite. If there is a contradiction with one of the value proposed, we assert the disequality instead of the inequality, and explore the product of possible combinations this way.
If we reach a contradiction in gamma_finite, and the contradiction does not involve any of the previous choices, we propagate the contradiction to the SAT.
If we reach a contradiction in gamma_finite, and the contradiction does involve some of the previous choices, there is a local backtracking: we keep a trail of model choices, reinitialize gamma_finite with gamma, and replay theappropriate choices that in the new gamma_finite, which may lead to either a contradiction or to new choices being made instead.

The issue with this approach is the part where we copy gamma into gamma_finite, because this is hard to support efficiently using a persistent store since there are two "versions" that must be in use at the same time.

The other issue with this approach has nothing to do with persistent union-finds: indeed, it means that most of the theory work that Alt-Ergo does is performed twice — once in gamma, and once in gamma_finite, in similar-but-different environments. It would probably make more sense to integrate the value assignments to the SAT trail, rather than keeping these two copies of the environment, making the whole loop closer to mcSAT (although we currently do now produce precise conflict explanations at the level of theories).

This poses the question of when to make boolean decisions vs theory decisions, and another host of subtle points to consider, so this becomes a more complex task than just changing the union-find implementation. There is one "quick fix" which could be to locally replay the choices when performing a case split and which I will investigate if it is acceptable performance-wise on the current code, which would allow to proceed — although in any case moving these decisions towards the SAT in some way seems like a good idea.

bclement-ocp mentioned this issue Sep 8, 2023

Test the shostak cache #779

Closed

bclement-ocp mentioned this issue Oct 10, 2023

The formulae top and bot are constant #868

Merged

Halbaroth assigned bclement-ocp Oct 18, 2023

Halbaroth added clean-up performance labels Oct 18, 2023

bclement-ocp mentioned this issue Nov 27, 2023

Timeouts are broken #946

Closed

bclement-ocp added the backlog label Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve UF implementation #798

Improve UF implementation #798

bclement-ocp commented Sep 8, 2023

bclement-ocp commented Oct 6, 2023

Improve UF implementation #798

Improve UF implementation #798

Comments

bclement-ocp commented Sep 8, 2023

bclement-ocp commented Oct 6, 2023