[FEA] Support `args=` in `cudf.Series.apply` #9598

brandon-b-miller · 2021-11-04T13:51:50Z

Is your feature request related to a problem? Please describe.
As a follow up to https://github.com/rapidsai/cudf/pull/9514/files we should support functions that accept scalar (non column) arguments in cudf.Series.apply, similar to pandas. Right now, cudf.Series.apply works by turning the series into a full dataframe and wrapping the incoming function as a row function in a lambda, as seen here. This is all fine and good as long as the UDF always accepts one argument, but breaks down if we want args. As a note functions written for pandas.Series.apply are not row udfs and are written in scalar form:

def f(x):
    return x + 2

vs the row version, which would work on a single column dataframe

def f(x):
    x = row['x']
    return x + 2

Describe the solution you'd like
We want this to work, so we either need to:

come up with a more general mechanism to transform the scalar UDF into a row UDF and then play the same game of promoting the series to a dataframe/ forwarding to cudf.DataFrame.apply
write a separate kernel that works for series and reuses as much of the row compilation machinery as possible.

Ultimately though we want to be able to use UDFs that look like this on Series objects:

def f(x, c):
    return x + c

sr.apply(f, args=(42,))

Describe alternatives you've considered
One can always just promote the series to a single column dataframe and write a row UDF instead as a workaround, but that is rather suboptimal and clumsy for the user.

Additional context
N/A

The text was updated successfully, but these errors were encountered:

github-actions · 2021-12-04T14:02:44Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Closes #9598 A lot of code was moved around but also slightly tweaked, making the diff a little harder to parse. Here's a summary of the changes: - `Series.apply` used to simply turn the incoming scalar lambda function into a row UDF and then turn itself into a dataframe and run the code as normal. Now, it does its own separate unique processing and pipes through `Frame._apply` instead. - `pipeline.py` was separated out into `row_function.py` and `lambda_function.py` which contain whatever is specific to each type of UDF, whereas everything that was common to both was migrated to `utils.py` and generalized as much as possible. - a `templates.py` area was created to hold all the templates and initializers needed to cat together the kernel that we need and a new template specific to series lambdas was created. - The caching machinery was abstracted out into `compile_or_get` and this function now expects a python function object it can call that will produce the right kernel. `DataFrame` and `Series` decide which one to use at the top level API. - Moved `_apply` from `Frame` to `IndexedFrame` Authors: - https://github.com/brandon-b-miller Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Michael Wang (https://github.com/isVoid) URL: #9982

brandon-b-miller added feature request New feature or request numba Numba issue Python Affects Python cuDF API. labels Nov 4, 2021

brandon-b-miller self-assigned this Nov 4, 2021

brandon-b-miller mentioned this issue Nov 4, 2021

Support args= in apply #9514

Merged

github-actions bot added the inactive-30d label Dec 4, 2021

brandon-b-miller mentioned this issue Jan 6, 2022

Support args= in Series.apply #9982

Merged

rapids-bot bot closed this as completed in #9982 Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support `args=` in `cudf.Series.apply` #9598

[FEA] Support `args=` in `cudf.Series.apply` #9598

brandon-b-miller commented Nov 4, 2021

github-actions bot commented Dec 4, 2021

[FEA] Support args= in cudf.Series.apply #9598

[FEA] Support args= in cudf.Series.apply #9598

Comments

brandon-b-miller commented Nov 4, 2021

github-actions bot commented Dec 4, 2021

[FEA] Support `args=` in `cudf.Series.apply` #9598

[FEA] Support `args=` in `cudf.Series.apply` #9598