-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SessionContext to parse Expr protobuf #2024
Conversation
# Conflicts: # ballista/rust/core/src/serde/physical_plan/mod.rs # datafusion-proto/src/from_proto.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @thinkharderdev!
Sorry for the late response. Looks like this PR is a little problematic and it has conflicts with the multiple tenancy SessionContexts. The major problem is in the Executor side, Executor side also need to run those methods to parse the Expr protobuf to physical plan, but Executor side should not have SessionContexts, the optimizers, planner and physical planner do not make sense in Executor side. And for the pluggable UDFs/UDAFs, I think we should have global level UDFs/UDAFs and session level UDFs/UDAFs. |
Hi, @thinkharderdev I think this PR doesn't resolve the issue. One of the key problem is, users register the UDF/UDFS with SessionContext.register_udf(), but how does the SessionContexts (ExecutionContext) in all the Executors know the registered UDF? The information is not propagate to executor side. For DataFusion, SessionContext.register_udf() is not a problem. But for Baliista, there are three different kind of SessionContext:
I'm going to remove the context created in Executor because I think it does not make sense to have Executor hold a global SessionContext. Please share your thoughts. |
I think the idea is that there are two ways to register the UDF/UDAF in the executors:
Based on the discussion in that PR we were thinking that the plugin mechanism could use the Ultimately I think we do need some sort of context in the executor if we are going to support all the extension points that DataFusion provides in Ballista. |
Hi, @thinkharderdev I'm going to remove the SessionContext in Executor code and make the Executor itself implement FunctionRegistry trait For udf/udaf registration in Executor side, I did not get a chance to scan the code in #1881. The two approaches you just mentioned should both work. And to support session level udf/udaf(temporary functions), we also need a way to propagate the temp functions' meta data and lib url or lib files from Scheduler side to Executor side. |
Can I suggest using |
I don't have anything substantial to add to this conversation -- it sounds like the challenges are well understood and there are good ideas of how to do the mapping from "function name" to "UDF implementation" on the executor side. It is great discussion to see. |
Which issue does this PR close?
Partially addresses #1882
Rationale for this change
Add serialization of UDF/UDAF to Ballista and
datafusion-proto
. We serialize the UDF/UDAF by name and then use the functions registered in theSessionContext
to deserialize.Users can either initialize Ballista instances with an
SessionContext
registered with the relevant functions or use the plugin mechanism being built as part of #1881.What changes are included in this PR?
Parsing logical
Expr
now requires anSessionContext
. Additionally, extension codecs inBallista
now need anExecutionContext
.Are there any user-facing changes?
Yes, the signatures for
PhysicalExtensionCodec.try_decode
andLogicalExtensionCodec.try_decode
now take a reference to anSessionContext
.