-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
settings in ExecuteQueryParams is omitted by the Ballista's scheduler.execute_query(), cause wrong partition count #1848
Comments
@thinkharderdev Please take a look. |
I think we need to introduce a session level state to hold any session specific configurations instead of global shared ExecutionContext/ExecutionContextState. We might have a shared Ballista Scheduler, different users might submit SQLs with different sql configurations or shuffle settings. |
Will do. I think there are a couple of different ways we can approach this:
Or both. 1 may be necessary anyway to support multi-tenancy but we may still, within a single namespace, want to allow specifying shuffle settings on a per-query basis. |
Also, good catch! Apologies for overlooking this. |
I would prefer to let the users choose the target partition at the current phase. Target partition should not be changed too dynamically, otherwise the runtime distributed physical plan will not be stable and could introduce additional shuffle exchanges. In future we might add some kind of adaptive methods to adjust the target partition size based on input/output data volume. |
Beside the target partition count, I think there are couple of other configuration options that could be specified by the users and can be changed dynamically, for example, batch_size, parquet_pruning, repartition_windows etc. I searched the open issues and found there are couple of configuration related issues that are still open. I think it is time to resolve those and come up with a more extensible configuration design. |
The issue is fixed. |
Describe the bug
The issue is caused by the changes 1677
which always use the ExecutionContext from the SchedulerServer.
Before the change, run TPCH benchmark Q1 on Ballista:
[2022-02-16T08:47:59Z INFO ballista_scheduler] Adding stage 1 with 1 pending tasks
[2022-02-16T08:47:59Z INFO ballista_scheduler] Adding stage 2 with 2 pending tasks
[2022-02-16T08:47:59Z INFO ballista_scheduler] Adding stage 3 with 1 pending tasks
After the change:
[2022-02-16T08:44:57Z INFO ballista_scheduler] Adding stage 1 with 1 pending tasks
[2022-02-16T08:44:57Z INFO ballista_scheduler] Adding stage 2 with 8 pending tasks
[2022-02-16T08:44:57Z INFO ballista_scheduler] Adding stage 3 with 1 pending tasks.
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
SchedulerServer should honor the configuration settings from the ExecuteQueryParams.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: