Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate Unsatisfiable Boolean Expressions #1716

Open
tustvold opened this issue Jan 31, 2022 · 10 comments
Open

Eliminate Unsatisfiable Boolean Expressions #1716

tustvold opened this issue Jan 31, 2022 · 10 comments
Labels
enhancement New feature or request

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I noticed Datafusion is not able to simplify the expression

A ^ !A

Whilst it is unlikely for a user to write such an expression, it is not uncommon for boolean expressions to reduce to something unsatisfiable. Where this is the case, substituting in false may allow for further simplification.

Describe the solution you'd like

It would be awesome if Datafusion was able to detect and eliminate unsatisfiable boolean expressions. I believe this is an NP-hard problem in the general case, but should be tractable for a low number of variables.

@tustvold tustvold added the enhancement New feature or request label Jan 31, 2022
@tustvold
Copy link
Contributor Author

On a related note it is unable to simplify expressions of the form

A ^ (!A v B)

@Dandandan
Copy link
Contributor

Dandandan commented Jan 31, 2022

Related to this, I think the PoC
Tokomak optimizer I created is especially suited for these kind of optimizations. https://github.com/Dandandan/datafusion-tokomak

@pjmore extended this optimizer to be able to optimize both expressions and logical plan nodes.
See here for more:
#1485

@tustvold
Copy link
Contributor Author

I'm not at all familiar with e-graphs, but I would perhaps naively expect something specialised to boolean algebra to significantly out-perform a method based on arbitrary expression rewriting? I guess I'm thinking something like BDDs, or even just recursively applying shannon expansion and relying on const propagation to fix the mess...

This isn't an area I'm hugely familiar with, but I can't help feeling it should just be the case of picking an off-the-shelf boolean expression minimizer and plugging it in 😅

FWIW a quick google search found this - https://docs.rs/boolean_expression/latest/boolean_expression/index.html

@alamb
Copy link
Contributor

alamb commented Jan 31, 2022

I do wonder if https://docs.rs/boolean_expression/latest/boolean_expression/index.html handles Nulls 🤔 which is an important niche case for database / sql

@alamb
Copy link
Contributor

alamb commented Jan 31, 2022

FWIW (A ^ !A) is NOT always false because of .... 🥁 NULL

alamb=# select null AND (not null);
 ?column? 
----------
 
(1 row)

@alamb
Copy link
Contributor

alamb commented Jan 31, 2022

We can do this rewrite

A ^ !A = false (iff A is not nullable)

@tustvold
Copy link
Contributor Author

Does something similar also hold for things like A AND false? Just wondering if the existing logic handles this correctly... 🤔

@alamb
Copy link
Contributor

alamb commented Jan 31, 2022

Does something similar also hold for things like A AND false? Just wondering if the existing logic handles this correctly... 🤔

Yes. In fact @NGA-TRAN fixed a bug related to that in #1245 -- I think we have it sufficiently tested now.

See some of the rules here
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/simplify_expressions.rs#L568

It actually turn out that A AND false is always false (even if A is NULL)

However, A OR false could be NULL or false

🤯

@tustvold
Copy link
Contributor Author

Aah yes, the SQL interpretation of NULL as "unknown" vs perhaps more intuitive notions of it... 😅

null == null --> null will never cease to make no sense to me...

@jackwener
Copy link
Member

It's depend on #2254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants