-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: volatile expressions should not be target of common subexpt elimination #8520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @viirya -- this is about the fastest turnaround I have ever seen for a issue filed --> PR created 🙏
I think we should be checking volatility slightly differently, and I left comments about why and how inline.
Thanks again for such a fast PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I find a document from Postgres which describes the same behavior:
Any function with side-effects must be labeled VOLATILE, so that calls to it cannot be optimized away. Even a function with no side-effects needs to be labeled VOLATILE if its value can change within a single query; some examples are random(), currval(), timeofday().
datafusion/expr/src/expr.rs
Outdated
ScalarFunctionDefinition::UDF(udf) => { | ||
udf.signature().volatility == crate::Volatility::Volatile | ||
} | ||
ScalarFunctionDefinition::Name(_) => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under which condition does the optimizer see an unresolved function name? And: if we don't know how the function behaves (because there's only a name) shouldn't we assume the worst case (i.e. that it is volatile)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "resolve" pass should probably run before other optimizers then, right? So maybe it's even safe to error out if we hit an unresolved function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to get error on unresolved scalar function here. Missing that it is a unresolved one. (maybe renaming it to Unresolved
🤔 ).
Yes, I agree this is a good idea. The resolution should run before the optimizers (we have it as part of the analysis pass at the moment) |
Should I remove the resolution from the analyser and put it outside /
before ?
…On Wed, 13 Dec 2023 at 10:56, Andrew Lamb ***@***.***> wrote:
The "resolve" pass should probably run before other optimizers then,
right? So maybe it's even safe to error out if we hit an unresolved
function here.
Yes, I agree this is a good idea. The resolution should run before the
optimizers (we have it as part of the analysis pass at the moment)
—
Reply to this email directly, view it on GitHub
<#8520 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGGOKKNGLCK7LWLUMPPTPDYJH27LAVCNFSM6AAAAABAR5MY36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJUGUZTOMJYHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @viirya for your quick and thorough work on this PR
Ok(udf.signature().volatility == crate::Volatility::Volatile) | ||
} | ||
ScalarFunctionDefinition::Name(func) => { | ||
internal_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
# Verify that multiple calls to volatile functions like `random()` are not combined / optimized away | ||
query B | ||
SELECT r FROM (SELECT r1 == r2 r, r1, r2 FROM (SELECT random() r1, random() r2) WHERE r1 > 0 AND r2 > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This test fails on main:
❯ SELECT r FROM (SELECT r1 == r2 r, r1, r2 FROM (SELECT random() r1, random() r2) WHERE r1 > 0 AND r2 > 0)
;
+------+
| r |
+------+
| true |
+------+
1 row in set. Query took 0.037 seconds.
Thanks for review @alamb @crepererum @waynexia |
…ination (apache#8520) * fix: volatile expressions should not be target of common subexpt elimination * Fix clippy * For review * Return error for unresolved scalar function * Improve error message
…ination (apache#8520) * fix: volatile expressions should not be target of common subexpt elimination * Fix clippy * For review * Return error for unresolved scalar function * Improve error message
…ination (apache#8520) * fix: volatile expressions should not be target of common subexpt elimination * Fix clippy * For review * Return error for unresolved scalar function * Improve error message
…ination (apache#8520) * fix: volatile expressions should not be target of common subexpt elimination * Fix clippy * For review * Return error for unresolved scalar function * Improve error message
…ination (apache#8520) * fix: volatile expressions should not be target of common subexpt elimination * Fix clippy * For review * Return error for unresolved scalar function * Improve error message
Which issue does this PR close?
Closes #8518.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?