Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF/UDAF plugin #1882

Closed
EricJoy2048 opened this issue Feb 25, 2022 · 4 comments
Closed

UDF/UDAF plugin #1882

EricJoy2048 opened this issue Feb 25, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@EricJoy2048
Copy link
Member

Now we cannot use UDF and UDAF in ballista because ballista cannot know how to serialize and deserialize UDF / UDAF.
We are using Trino. Referring to the practice of Trino, we can realize the plug-in of UDF through the way of rust dynamic library. In this way, ballista and datafusion only need to know the plug-in interface of UDF, and they can work without knowing the specific implementation of UDF.

@EricJoy2048 EricJoy2048 added the enhancement New feature or request label Feb 25, 2022
@Igosuki
Copy link
Contributor

Igosuki commented Feb 27, 2022

Hi, really cool stuff. I use dtolnay/inventory on my project, but it has a known issue where one cannot guarantee that symbols won't get mangled from statically compiled code by llvm. Stuff got stirred and the core team reacted rust-lang/rust#47384 but it's not solved yet.
Can we guarantee here that statically compiled plugins won't end up forgotten in binaries ?

Secondly, I think it'd be cool to implement your interface in datafusion python so that people can use a python function as a UDAF like it's done in pyspark https://spark.apache.org/docs/2.4.0/sql-pyspark-pandas-with-arrow.html#pandas-udfs-aka-vectorized-udfs

@EricJoy2048
Copy link
Member Author

EricJoy2048 commented Feb 27, 2022

In the PR: #1881 I refer to https://adventures.michaelfbryan.com/posts/plugins-in-rust/ and https://michael-f-bryan.github.io/rust-ffi-guide/dynamic_loading.html These two articles are used to design UDF plugin. The idea requires that the crate type of the plug-in must be cdylib .According to the preliminary test, there is no problem of statically compiled plugins won't end up forgotten in binaries.
Sorry, I'm not very familiar with Python. If you like, you can help implement the code related to datafusion python.
thank you!

@alamb
Copy link
Contributor

alamb commented Apr 25, 2022

@gaojun2048 is this issue still tracking anything actionable? I think this has been done

@EricJoy2048
Copy link
Member Author

Now we cannot use UDF and UDAF in ballista because ballista cannot know how to serialize and deserialize UDF / UDAF.
We are using Trino. Referring to the practice of Trino, we can realize the plug-in of UDF through the way of rust dynamic library. In this way, ballista and datafusion only need to know the plug-in interface of UDF, and they can work without knowing the specific implementation of UDF.

Yes. Let me close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants