-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ExecutionContext and related conf to support multi-tenancy configurations. #1862
Comments
The change makes sense to me over all 👍 Only nitpick is I think SQLConfig is not a good name for the following two reasons: 1) target partition and batch_size are not SQL specific configs 2) I think it would be better to design the system so that SQL is just a subset of the features that are supported by ballista, in the long run, I consider ballista a distributed compute framework that can execute arbitrary relational queries defined by users, even with custom UDFs. Perhaps we can keep the ExecutionConfig name? Or maybe rename to SessionConfig. |
I think we need a way to allow the registration of custom optimizers, udfs, etc at startup. Part of the rationale for #1677 was to create a global Maybe we can still create the scheduler with an |
I'm working on it now, the PR is quite huge. Need to make execution plan session config aware. In the ExecutionPlan trait, I add two methods, and each ExecutionPlan need to add the session_id as its member.
And RuntimeEnv is changed to a global singleton structure. It was created at the very beginning when Executor or Scheduler is firstly initialized in the main() method. And RuntimeEnv does not need to pass down to plan's execution path anymore.
The new SessionContexts are registered to Scheduler's RuntimeEnv. And Task contexts are registered to Executors' RuntimeEnv. SessionContext contains the internal SessionState with the planners, optimizers, udf/udaf, SessionConfig etc. In the TaskDefinition proto, add session_id and props name-value pairs so that session_id and configurations can
|
I am sorry for the late review. So we have something like this in IOx where we have context we want to use and pass down when we run plans using DataFusion and it supports multi-tenancy quite well. The approach we went with is to create a I am a little worried about introducing a "global singleton" into DataFusion -- it may save plumbing work in the short term but I think it makes it hard to write tests and cause other hard to diagnose problems. I think it is more maintainable longer term to explicitly provide all contexts required If you are describing a change to Ballista (to add a global singleton into Ballista) I think that makes much more sense as I expect Ballista to be used standalone rather than as a library for others to build upon |
Note IOx uses custom optimizers and UDFs too |
|
Just close this issue. We can have further discussion in the PR request. |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Fixe 1848
Address 138 and 682
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)
Describe the solution you'd like
In order to support more extensible and multi-tenancy configurations , we need to introduce a session related context to isolate the user specific configurations. The configurations should be correctly propagate to the executor side along with tasks/plans.
Here is the proprosal:
optimizers rule and SQL related configurations etc.
I can work on the PRs. There will be two or three PRs to cover different parts.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: