-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFusion Configuration Consolidation #4349
Comments
However, one question, raised by @avantgardnerio in apache/datafusion-ballista#479, is how to structure the interactions with the configuration code. It turns out that |
🤔 there is also ExecutionProps which is some subset of the TaskProperties 🤔 |
I think consolidating SessionConfig/ConfigOptions is definitely the right direction. The debate should be how to consolidate them, either keep SessionConfig or keep ConfigOptions. I do not have a clear preference on this either. maybe we can have a vote. |
We should consolidate ExecutionProps and TaskProperties also. |
I'm OK with the change in this PR. Ballista should still work after this PR. |
Here is my next contribution to clean up configuration: #4427 (slowly consolidating the configurations) |
If a TaskContext represents a query/task, then could |
I think it might be more precise to describe
Correct, at the point in planning where
FWIW this is what IOx does Edit: I wrote some docs in #2655 last time I got thoroughly confused by this system that I think are still in date. I see we have grown a few more config structs since then though Edit Edit: It may also be the case that a single query is broken into multiple |
Thank you @tustvold for clarification, in such case I will follow |
I think this is largely complete and we don't have any additional work planned here, so closing |
Related
apache/datafusion-ballista#479
#3885
TLDR Recommendations
This is a complicated issue and I don't have a magic answer. However I have some concrete suggestions
Some suggested steps:
ConfigOptions
easier to work with #3886I think consolidating SessionConfig/Config options is likely to be the most controversial / cause the most chrun but it will provide immense benefits I think (like runtime visibility into the current settings)
Then we can further improve from there
Introduction
"Configuration" in DataFusion has a few usecases:
set XX = YY
in datafusion-cli)ConfigOptions::from_env
SHOW
in datafusion-cli`)There are also two overlapping "levels" of configuration that are needed
Current state of configuration in DataFusion
The current state is .... inconsistent to put it mildly.
The core structure is
SessionContext
which is the final glue and entry point to interacting with datafusion (e.g. tables provided, etc).Within the SessionContext there is the some combination of SessionState, SessionConfig, ConfigOptions. Part of the hierarchy is like this:
SessionConfig
is effectively the Session level configuration I describe above.TaskContext is the statement level (aka per task / per query) level context. If you look hard you can see has a copy of the SessionConfig (buried in TaskPropertoes) or also maybe is backed by KVPairs.
Desire
I would like to have a clear configuration system that cleanly separates the statement level config from the task level config and allows configuration values to be set in a uniform manner and that are easy to view programmatically
The text was updated successfully, but these errors were encountered: