-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to DataFusion 17.0.0 #998
Conversation
current failures:
|
…s other aggregates (columns followed by agg columns)
match &self.frame_bound { | ||
WindowFrameBound::Preceding(val) | WindowFrameBound::Following(val) => match val { | ||
x if x.is_null() => Ok(None), | ||
ScalarValue::UInt64(v) => Ok(*v), | ||
ScalarValue::Int64(v) => Ok(v.map(|n| n as u64)), | ||
ref x => Err(DaskPlannerError::Internal(format!( | ||
"Unexpected window frame bound: {:?}", | ||
x | ||
)) | ||
.into()), | ||
}, | ||
// The below is only safe because window bounds cannot be negative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably a better way of writing the same logic win rust. cc @andygrove if you have suggestions
… dnf based filter for some cases
class RexAliasPlugin(BaseRexPlugin): | ||
""" | ||
A RexAliasPlugin is an expression, which references a Subquery. | ||
This plugin is thin on logic, however keeping with previous patterns | ||
we use the plugin approach instead of placing the logic inline | ||
""" | ||
|
||
class_name = "RexAlias" | ||
|
||
def convert( | ||
self, | ||
rel: "LogicalPlan", | ||
rex: "Expression", | ||
dc: DataContainer, | ||
context: "dask_sql.Context", | ||
) -> Union[dd.Series, Any]: | ||
# extract the operands; there should only be a single underlying Expression | ||
operands = rex.getOperands() | ||
assert len(operands) == 1 | ||
|
||
sub_rex = operands[0] | ||
|
||
value = RexConverter.convert(rel, sub_rex, dc, context=context) | ||
|
||
if isinstance(value, DataContainer): | ||
return value.df | ||
|
||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to resolve the query failures coming from our lack of handling for rex aliases; for now I've kept this separate from the relatively similar RexScalarSubqueryPlugin
, not sure if there's a case to have these merged into one plugin that then conditionally grabs and converts the sub-rel/rex
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #998 +/- ##
==========================================
+ Coverage 81.09% 81.16% +0.06%
==========================================
Files 77 78 +1
Lines 4338 4358 +20
Branches 779 782 +3
==========================================
+ Hits 3518 3537 +19
+ Misses 648 643 -5
- Partials 172 178 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This was a long haul of a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove and @jdye64 for taking the time to unblock CI 🎉 couple of comments/questions but things mostly look good:
[ | ||
[("a", "==", 1), ("b", "<", 10)], | ||
[("a", "==", 1), ("b", ">", 5)], | ||
[("b", ">", 5), ("b", "<", 10)], | ||
[("a", "==", 1)], | ||
], | ||
[[("b", ">", 5), ("b", "<", 10)], [("a", "==", 1)]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking; are these param changes related to disabling FilterColumnsPostJoin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why those changed? I don't think it was me that changed them unless I just don't remember. It should not be related to FilterColumnsPostJoin
however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say the new change seems more correct than the original way anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can comment a bit more on this. This isn't related to FilterColumnsPostJoin
, but how datafusion optimizes predicates for filters/table_scan operations. Starting df14, they started using cnf (conjunctive normal form) for the expressions being passed as filters. (See #903 (comment)).
Since this particular test already had filters in dnf, the conversion of dnf->cnf (by df) -> dnf (by dask-sql since that's what arrow expects) resulted in an overly complex filter.
df16/17 seems to simplify some of these expressions and doesn't convert everything to cnf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm! Just minor comments/questions that should be a quick fix
optimizer: Optimizer, | ||
} | ||
|
||
impl DaskSqlOptimizer { | ||
/// Creates a new instance of the DaskSqlOptimizer with all the DataFusion desired | ||
/// optimizers as well as any custom `OptimizerRule` trait impls that might be desired. | ||
pub fn new(skip_failing_rules: bool) -> Self { | ||
pub fn new() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we removed this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That configuration value is no longer exposed via datafusion therefore this was "dead_code" and the compiler issuing warnings and cargo clippy
was failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah they might have renamed that one to skip_failed_rules
. https://docs.rs/datafusion-common/17.0.0/datafusion_common/config/struct.OptimizerOptions.html
We can update in a followup if needed.
@ayushdg how do the changes suit you? |
Changes in DataFusion that are causing tests to fail:
Column
but can be anyExpr