Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to DataFusion 14.0.0 #903

Merged
merged 21 commits into from
Nov 15, 2022
Merged
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
9a61598
upgrade to latest datafusion rev
andygrove Nov 3, 2022
c2f2d1c
panic on unexpected value
andygrove Nov 3, 2022
4e9339d
remove panic
andygrove Nov 3, 2022
1acfa69
fix regression with window functions
andygrove Nov 3, 2022
193b25d
fix regression
andygrove Nov 3, 2022
b3667ed
use official release of DataFusion
andygrove Nov 8, 2022
cf69b86
Merge branch 'main' into datafusion-14
andygrove Nov 8, 2022
60551a9
update optimizer rules list
andygrove Nov 8, 2022
ce2db78
Merge branch 'datafusion-14' of github.com:andygrove/dask-sql into da…
andygrove Nov 8, 2022
0a4733e
add filter_push_down rule from DataFusion 13
andygrove Nov 8, 2022
355a385
Merge remote-tracking branch 'upstream/main' into datafusion-14
andygrove Nov 8, 2022
0b06335
fix
andygrove Nov 8, 2022
2d1f8a4
add expr simplifier rule but without optimization for rewriting small…
andygrove Nov 10, 2022
07e171e
remove unused imports
andygrove Nov 10, 2022
97a1ea6
Disable EliminateFilter optimization to unblock regressions
charlesbluca Nov 14, 2022
3ad4d7a
Use upstream SimplifyExpressions, catch associated KeyError
charlesbluca Nov 14, 2022
459edf1
Forbid auto-index setting in attempt_predicate_pushdown
charlesbluca Nov 14, 2022
3a2e68c
Ignore index in test_predicate_pushdown
charlesbluca Nov 14, 2022
63b4e5d
Add dask version check to predicate pushdown tests
charlesbluca Nov 15, 2022
65c5669
Merge remote-tracking branch 'origin/main' into datafusion-14
charlesbluca Nov 15, 2022
b9dfc08
Add TODO for index specification
charlesbluca Nov 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions dask_sql/physical/utils/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def attempt_predicate_pushdown(ddf: dd.DataFrame) -> dd.DataFrame:
try:
return dsk.layers[name]._regenerate_collection(
dsk,
new_kwargs={io_layer: {"filters": filters}},
new_kwargs={io_layer: {"filters": filters, "index": False}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the issues with predicate pushdown were stemming from the automatic setting of an index in read_parquet by default, which this kwarg override should disallow.

Chatting with @rjzamora, we agreed that this shouldn't be the default behavior, so we may be able to remove this override later on when changes are made upstream.

)
except ValueError as err:
# Most-likely failed to apply filters in read_parquet.
Expand Down Expand Up @@ -245,14 +245,7 @@ def _regenerate_collection(
regen_kwargs = self.creation_info.get("kwargs", {}).copy()
regen_kwargs = {k: v for k, v in self.creation_info.get("kwargs", {}).items()}
regen_kwargs.update((new_kwargs or {}).get(self.layer.output, {}))
try:
result = func(*inputs, *regen_args, **regen_kwargs)
# FIXME: not immediately obvious what is causing this KeyError for some predicate pushdowns
except KeyError:
raise ValueError(
"`_regenerate_collection` failed. "
"Not all HLG layers are regenerable."
)
result = func(*inputs, *regen_args, **regen_kwargs)
_regen_cache[self.layer.output] = result
return result

Expand Down