Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: numba engine in df.apply #54666

Merged
merged 12 commits into from
Sep 11, 2023
Merged

Conversation

lithomas1
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@lithomas1 lithomas1 added Apply Apply, Aggregate, Transform, Map numba numba-accelerated operations labels Aug 21, 2023
@lithomas1 lithomas1 requested a review from mroeschke August 24, 2023 15:33
@lithomas1
Copy link
Member Author

gentle ping @mroeschke.

I'm planning on following this up with general support for numba in df.apply.

@@ -9919,6 +9919,8 @@ def apply(
result_type: Literal["expand", "reduce", "broadcast"] | None = None,
args=(),
by_row: Literal[False, "compat"] = "compat",
engine: str = "python",
engine_kwargs: dict = {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually default this as engine_kwargs: dict[str, bool] | None = None

@@ -9919,6 +9919,8 @@ def apply(
result_type: Literal["expand", "reduce", "broadcast"] | None = None,
args=(),
by_row: Literal[False, "compat"] = "compat",
engine: str = "python",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you type as a Literal here?

else:
first_elem = values[0]
dim0 = values.shape[0]
res0 = nb_compat_func(first_elem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is inferring the shape from the first element similar to what we do for DataFrame.apply?

It would be good to note what type of UDFs are supported in the engine docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.apply_along_axis, which we use does this.

see https://github.com/numpy/numpy/blob/d676a1fe2d495f9d8a86103644bed141c2e69787/numpy/lib/_shape_base_impl.py#L373-L380.

It would be good to note what type of UDFs are supported in the engine docstring

I'll add a note linking to numba's supported Python/numpy features.

@lithomas1
Copy link
Member Author

pre-commit.ci autofix

@lithomas1
Copy link
Member Author

pre-commit.ci autofix

@mroeschke mroeschke added this to the 2.2 milestone Sep 11, 2023
@mroeschke mroeschke merged commit ce5fdf0 into pandas-dev:main Sep 11, 2023
@mroeschke
Copy link
Member

Nice! Thanks @lithomas1

mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Sep 11, 2023
* ENH: numba engine in df.apply

* fixes

* more fixes

* try to fix

* address code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* go for green

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update type

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@lithomas1 lithomas1 deleted the numba-raw-apply branch September 11, 2023 19:25
@lithomas1
Copy link
Member Author

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map numba numba-accelerated operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants