-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
select multiple columns in a single Expr
#10102
Comments
It looks good to me, and we can gradually deprecate single column in the long-term |
@universalmind303 are you already working on this? |
no, feel free to go ahead and work on it if you'd like. |
This would require also modifying the protobuf definitions, are we ok with it? |
How about returning |
My current wip has a struct since it supports different data types and field names, can a List work as well? |
No, you are right, a List can't really support multiple different types |
@universalmind303 I think it would be interesting to test it, do you have an example / have you written the UDF? |
I wonder if |
that works, but isn't this unrelated to this change? I suppose it was working before too... |
@alamb by the way, how does the |
I think https://datafusion.apache.org/user-guide/sql/scalar_functions.html#struct has some pretty good examples |
Apologies I wasn't precise in my question. The code of the struct udf doesn't explicitly drop fields but returns all of them. How does the selection happens ? |
I think you can select with something like However now re-reading this ticket, I think it would be possible to create a user defined function like |
What I was trying to say is that looking at that UDF isn't clear for me how the other fields are dropped. It seems that Datafusion performs an intersection between the field names returned by the return type and the ones returned by array_struct ? |
FWIW, #10102 seems related as multi selection could be implemented via |
@edmondop
I don't think we drop any field in struct. One of the example can be
|
I want to create a udf that can select multiple functions at once, such as a COLUMNS() function. select COLUMNS('number\d+') from my_table. looking at the struct UDF, it seems that it only receives the columns that are passed to function invocation and doesn't have access to other columns, i.e. in the However, in the case of COLUMNS('number\d+'), you need to have all the columns, and only return few of them from the function. In my understanding neither |
I agree, we can't get all the columns by the current design of function, it is quite challenging than I thought 🤔. We expect to build up a projection plan given the syntax The difference behaviour in parser (datafusion/sql) between these.
To make the mentioned function possible, we don;t even need to introduce datafusion/datafusion/sql/src/select.rs Lines 449 to 457 in 7535d93
|
@jayzhan211 that function already returns a |
|
I didn't find equivalent behavior in postgres. I'm not sure should we support this kind of |
Is your feature request related to a problem or challenge?
I want to create a udf that can select multiple functions at once, such as a
COLUMNS(<regex>)
function.select COLUMNS('number\d+') from my_table
.Currently this is not possible due to the fact that udfs can only ever output a single
Expr
.Describe the solution you'd like
Since it would be quite a massive overhaul to refactor all of the planning and udf logic to return
Vec<Expr>
, I propose adding a new variant toExpr
.Expr::Columns(Vec<Column>)
.This seems like the least invasive way to support selecting multiple columns in a single expr.
Describe alternatives you've considered
I'm open for alternatives, but I am not aware of any.
Additional context
Polars has this variant in their
Expr
https://github.com/pola-rs/polars/blob/9fec2ecb6d4295969e1d155b386ee82db08745a1/crates/polars-plan/src/dsl/expr.rs#L72
The text was updated successfully, but these errors were encountered: