-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correlated subqueries #683
Correlated subqueries #683
Conversation
Fixes example in #320
|
Codecov Report
@@ Coverage Diff @@
## datafusion-sql-planner #683 +/- ##
=========================================================
Coverage ? 66.95%
=========================================================
Files ? 73
Lines ? 3640
Branches ? 753
=========================================================
Hits ? 2437
Misses ? 1057
Partials ? 146 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening @sarahyurick! While running through the issue repro you shared, I notice we also get some DataFusion warnings / errors:
Skipping optimizer rule decorrelate_scalar_subquery due to unexpected error: scalar subqueries must have a filter to be correlated at /home/nfs/charlesb/.cargo/git/checko
uts/arrow-datafusion-71ae82d9dec9a01c/6c32098/datafusion/optimizer/src/decorrelate_scalar_subquery.rs:177
caused by
Error during planning: Could not coerce into Filter! at /home/nfs/charlesb/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/6c32098/datafusion/expr/src/logical_plan
/plan.rs:1127
cc @andygrove in case you have some thoughts on this
except ValueError: | ||
return reduce( | ||
partial(self.operation, **kwargs), | ||
(operands[0], float(operands[1][operands[1].columns[0]].loc[0].compute())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagine we'd want to generalize the typecast here to handle other potential aggregating functions, but ATM can't think of an immediate way to do this.
I filed an issue against DataFusion to add support for this type of query: apache/datafusion#3266 |
Looks like this was resolved on the DataFusion side with apache/datafusion#3287 ! |
Previously, we would get a
ValueError: Not all divisions are known, can't align partitions. Please use set_index to set the index.
for something like:Not sure if this is the way we should go about this (not generalizable enough?), but here is an initial quick fix for that example. The general idea is that since we are comparing
listed_price
to a 1x1 table containingAVG(listed_price)
, the latter has to be converted to a single value by callingcompute()
and with casting.