-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ballista] Add ballista plugin manager and UDF plugin #2131
Conversation
Currently I believe the "plugin_dir is a local dir, I think it is better to support distributed file systems(HDFS/Object store) so that both the Executors and Scheduler can load the plugin files from a single place. |
Good Idea, I will add this feature in the future. |
Alternatively, users could package up dependencies in a Docker container and deploy that way. This could be more efficient in the case where multiple executors are running on the same node since the image will be downloaded once and cached. It also provides better version control - all executors will be guaranteed to be running the same code (assume a specific version of the image is deployed). I would be interested to hear more about the use case of loading dependencies from object store though. What would be the motivation of this approach? |
maybe in the future, we can support create custom udf&udaf like hive. CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar'; |
Who can review and merge this pr? We need use this feature in our ballista cluster. |
I'm not quite qualified to review the code in Ballista, but I could help merge the PR once consensus is reached. @yahoNanJing @mingmwang, do you want to give a review pass since you are actively working on this? |
I think that since @thinkharderdev has reviewed this and we have talked about it for a while, I will merge the code in and we can iterate on it as needed. Thank you for your patience and perseverance @gaojun2048 |
@andygrove, if the udf/udaf libraries can only be loaded from local disk, we need to build a new image and redeploy the whole cluster when there's any changes for the libraries. Otherwise, if the udf/udaf libraries can be loaded from a shared remote storage, the image does not depend on the libraries and it will be easier to handle the changes. |
Thank you all. I will iterate on it lately. |
A sub pr of #1881
Because #1881 It includes plugin, plugin load and serialization and deserialization. Then, the serialization and deserialization communities of LogicalPlan and PhysicalPlan have been changing the implementation. So, I push this pr, This PR only includes plugin, plugin loader, Not includes serialization and deserialization for UDF/UDAF.