Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support passing scalar args to df.apply #9500

Closed
randerzander opened this issue Oct 22, 2021 · 0 comments · Fixed by #9514
Closed

[FEA] Support passing scalar args to df.apply #9500

randerzander opened this issue Oct 22, 2021 · 0 comments · Fixed by #9514
Assignees
Labels
feature request New feature or request numba Numba issue Python Affects Python cuDF API.

Comments

@randerzander
Copy link
Contributor

randerzander commented Oct 22, 2021

I often want to use a "UDF" on a DataFrame where some scalar values can be supplied programmatically.

In Pandas, I can do:

import pandas as pd

pdf = pd.DataFrame({'val': [0, 1, 2]})

def func(x, A, B):
  return x['val'] + A - B

pdf.apply(func, axis=1, args=(1, 2))
0   -1
1    0
2    1

With cuDF, I can't see a way to pass scalar args to func:

import cudf

cdf = cudf.from_pandas(pdf)
cdf.apply(func, axis=1, args=(1, 2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rgelhausen/conda/envs/dsql/lib/python3.8/site-packages/cudf-21.12.0a0+252.ga694162f09-py3.8-linux-x86_64.egg/cudf/core/dataframe.py", line 4237, in apply
    raise ValueError("args and kwargs are not yet supported.")
ValueError: args and kwargs are not yet supported.

As a workaround, I can do something like:

A = 1
B = 2

def func(x):
  return x['val'] + A - B

cdf.apply(func)

And if I want different values for A and B, I can update them then apply func again.

@randerzander randerzander added feature request New feature or request Needs Triage Need team to review and classify labels Oct 22, 2021
@brandon-b-miller brandon-b-miller added Python Affects Python cuDF API. numba Numba issue and removed Needs Triage Need team to review and classify labels Oct 22, 2021
@beckernick beckernick added this to the UDF Enhancements milestone Oct 27, 2021
@rapids-bot rapids-bot bot closed this as completed in #9514 Nov 4, 2021
rapids-bot bot pushed a commit that referenced this issue Nov 4, 2021
Closes #9500

Allows passing `args=` to `DataFrame.apply` as is supported in pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

Concretely, this allows for this:

```python
import cudf
df = cudf.DataFrame({
    'a':[1,2,3]
})

def f(row, c):
    return row['a'] + c

res = df.apply(f, args=(3,))
```

cc @randerzander

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request numba Numba issue Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants