You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As a follow up to https://github.com/rapidsai/cudf/pull/9514/files we should support functions that accept scalar (non column) arguments in cudf.Series.apply, similar to pandas. Right now, cudf.Series.apply works by turning the series into a full dataframe and wrapping the incoming function as a row function in a lambda, as seen here. This is all fine and good as long as the UDF always accepts one argument, but breaks down if we want args. As a note functions written for pandas.Series.apply are not row udfs and are written in scalar form:
deff(x):
returnx+2
vs the row version, which would work on a single column dataframe
deff(x):
x=row['x']
returnx+2
Describe the solution you'd like
We want this to work, so we either need to:
come up with a more general mechanism to transform the scalar UDF into a row UDF and then play the same game of promoting the series to a dataframe/ forwarding to cudf.DataFrame.apply
write a separate kernel that works for series and reuses as much of the row compilation machinery as possible.
Ultimately though we want to be able to use UDFs that look like this on Series objects:
deff(x, c):
returnx+csr.apply(f, args=(42,))
Describe alternatives you've considered
One can always just promote the series to a single column dataframe and write a row UDF instead as a workaround, but that is rather suboptimal and clumsy for the user.
Additional context
N/A
The text was updated successfully, but these errors were encountered:
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
Closes#9598
A lot of code was moved around but also slightly tweaked, making the diff a little harder to parse. Here's a summary of the changes:
- `Series.apply` used to simply turn the incoming scalar lambda function into a row UDF and then turn itself into a dataframe and run the code as normal. Now, it does its own separate unique processing and pipes through `Frame._apply` instead.
- `pipeline.py` was separated out into `row_function.py` and `lambda_function.py` which contain whatever is specific to each type of UDF, whereas everything that was common to both was migrated to `utils.py` and generalized as much as possible.
- a `templates.py` area was created to hold all the templates and initializers needed to cat together the kernel that we need and a new template specific to series lambdas was created.
- The caching machinery was abstracted out into `compile_or_get` and this function now expects a python function object it can call that will produce the right kernel. `DataFrame` and `Series` decide which one to use at the top level API.
- Moved `_apply` from `Frame` to `IndexedFrame`
Authors:
- https://github.com/brandon-b-miller
Approvers:
- Vyas Ramasubramani (https://github.com/vyasr)
- Michael Wang (https://github.com/isVoid)
URL: #9982
Is your feature request related to a problem? Please describe.
As a follow up to https://github.com/rapidsai/cudf/pull/9514/files we should support functions that accept scalar (non column) arguments in
cudf.Series.apply
, similar to pandas. Right now,cudf.Series.apply
works by turning the series into a full dataframe and wrapping the incoming function as a row function in a lambda, as seen here. This is all fine and good as long as the UDF always accepts one argument, but breaks down if we wantargs
. As a note functions written forpandas.Series.apply
are not row udfs and are written in scalar form:vs the row version, which would work on a single column dataframe
Describe the solution you'd like
We want this to work, so we either need to:
cudf.DataFrame.apply
Ultimately though we want to be able to use UDFs that look like this on Series objects:
Describe alternatives you've considered
One can always just promote the series to a single column dataframe and write a row UDF instead as a workaround, but that is rather suboptimal and clumsy for the user.
Additional context
N/A
The text was updated successfully, but these errors were encountered: