Add `cudf.DataFrame.applymap` #10542

brandon-b-miller · 2022-03-30T16:33:00Z

Naive implementation of DataFrame.applymap that just calls apply in a loop over columns.

This could theoretically be made much faster within our framework. This requires at worst N compilations and M kernel launches, where N is the number of different dtypes in the data, and M is the number of total columns. We could however as an improvement to this launch just one kernel that populates the entire output data. This would still suffer from the compilation bottleneck however, since the function must be compiled in order for an output dtype to be determined, and this will need to be done for each distinct dtype within the data.

Part of #10169

bdice

A few small suggestions.

python/cudf/cudf/core/dataframe.py

bdice · 2022-03-30T18:44:11Z

python/cudf/cudf/core/dataframe.py

+        func : callable
+            Python function, returns a single value from a single value.
+        na_action : {None, 'ignore'}, default None
+            If ``ignore``, propagate NaN values, without passing them to func.


Use quotes here, not code font.

Suggested change

If ``ignore``, propagate NaN values, without passing them to func.

If 'ignore', propagate NaN values, without passing them to func.

bdice · 2022-03-30T18:44:40Z

python/cudf/cudf/core/dataframe.py

+        """
+
+        if kwargs:
+            raise ValueError(


I think we usually raise NotImplementedError for this kind of thing, and ValueError for invalid values (like na_action not in {"ignore", None} below).

Suggested change

raise ValueError(

raise NotImplementedError(

codecov · 2022-04-06T16:09:07Z

Codecov Report

Merging #10542 (e1d444c) into branch-22.06 (3c13ef1) will increase coverage by 0.04%.
The diff coverage is 98.14%.

@@               Coverage Diff                @@
##           branch-22.06   #10542      +/-   ##
================================================
+ Coverage         86.33%   86.38%   +0.04%     
================================================
  Files               140      142       +2     
  Lines             22289    22338      +49     
================================================
+ Hits              19244    19296      +52     
+ Misses             3045     3042       -3

Impacted Files	Coverage Δ
python/cudf/cudf/core/algorithms.py	`90.47% <ø> (ø)`
python/cudf/cudf/core/groupby/groupby.py	`91.72% <ø> (+0.22%)`	⬆️
python/cudf/cudf/core/multiindex.py	`92.14% <ø> (ø)`
python/cudf/cudf/core/series.py	`95.28% <ø> (ø)`
python/cudf/cudf/core/single_column_frame.py	`96.52% <ø> (+0.07%)`	⬆️
python/cudf/cudf/utils/cudautils.py	`59.83% <ø> (ø)`
python/cudf/cudf/utils/utils.py	`90.28% <ø> (ø)`
python/dask_cudf/dask_cudf/tests/utils.py	`90.90% <90.90%> (ø)`
python/cudf/cudf/core/column/lists.py	`92.79% <100.00%> (+1.38%)`	⬆️
python/cudf/cudf/core/dataframe.py	`93.69% <100.00%> (+0.10%)`	⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e8e92c...e1d444c. Read the comment docs.

bdice

I have a few minor (non-blocking) suggestions. This looks good overall!

python/cudf/cudf/core/dataframe.py

python/cudf/cudf/tests/test_applymap.py

python/dask_cudf/dask_cudf/tests/test_applymap.py

bdice · 2022-04-07T20:47:04Z

python/dask_cudf/dask_cudf/tests/utils.py

+    df = pd.DataFrame({"x": [], "y": []})
+    gdf = cudf.DataFrame.from_pandas(df)
+    dgf = dd.from_pandas(gdf, npartitions=npartitions)
+    return dgf


Seems strange that this only returns dgf while _make_random_frame and _make_random_frame_float return df, dgf. Should we symmetrize this?

Sorry, I probably shouldn't have moved this function in the first place since it's being consumed elsewhere and not actually used in my tests. I just moved it back for now.

Co-authored-by: Bradley Dice <[email protected]>

brandon-b-miller · 2022-04-13T02:57:42Z

@gpucibot merge

brandon-b-miller · 2022-04-13T02:59:35Z

oops, needs dask review.

galipremsagar · 2022-04-13T17:55:53Z

python/cudf/cudf/core/dataframe.py

@@ -3718,6 +3720,68 @@ def apply(

        return self._apply(func, _get_row_kernel, *args, **kwargs)

+    def applymap(


Could you also add this entry to this section of docs: https://github.com/rapidsai/cudf/blob/branch-22.06/docs/cudf/source/api_docs/dataframe.rst#function-application-groupby--window

galipremsagar · 2022-04-13T18:48:22Z

python/dask_cudf/dask_cudf/tests/test_applymap.py

+
+from dask import dataframe as dd
+
+from .utils import _make_random_frame


Can we do an absolute import here instead of a relative import so that it is consistent with other imports here and elsewhere in the code-base?

galipremsagar · 2022-04-13T18:48:33Z

python/dask_cudf/dask_cudf/tests/test_binops.py

@@ -8,6 +10,8 @@

 import cudf

+from .utils import _make_random_frame


Here aswell

brandon-b-miller added 8 commits March 24, 2022 13:06

initial

67a6187

Merge branch 'branch-22.06' into fea-dataframe-applymap

e087124

Merge branch 'branch-22.06' into fea-dataframe-applymap

454d9d8

updates

2871aa1

add tests for na_action

b6827b5

match pandas error

06348f7

add dask_cudf tests

6fb742d

copyright

6ce8383

brandon-b-miller added feature request New feature or request 2 - In Progress Currently a work in progress numba Numba issue Python Affects Python cuDF API. dask Dask issue non-breaking Non-breaking change labels Mar 30, 2022

brandon-b-miller requested review from a team as code owners March 30, 2022 16:33

brandon-b-miller requested review from galipremsagar and skirui-source March 30, 2022 16:33

little cleanup

262c958

bdice reviewed Mar 30, 2022

View reviewed changes

brandon-b-miller added 3 commits April 5, 2022 13:28

Merge branch 'branch-22.06' into fea-dataframe-applymap

1c5d7ad

address reviews

bd311ab

fix up type hints

db2fee9

brandon-b-miller added 2 commits April 6, 2022 14:08

respond to ci review..

7081276

use black from the correct conda environment

7d7b304

brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 7, 2022

Add blank link.

85963a8

bdice approved these changes Apr 7, 2022

View reviewed changes

brandon-b-miller and others added 4 commits April 8, 2022 10:57

Apply suggestions from code review

6b91f33

Co-authored-by: Bradley Dice <[email protected]>

dont move all functions to utils yet

b344532

fix imports

d342f8e

Merge branch 'branch-22.06' into fea-dataframe-applymap

137604e

galipremsagar requested changes Apr 13, 2022

View reviewed changes

add applymap to docs

477a824

galipremsagar reviewed Apr 13, 2022

View reviewed changes

use absolute imports

e1d444c

galipremsagar approved these changes Apr 13, 2022

View reviewed changes

rapids-bot bot merged commit ce56bc3 into rapidsai:branch-22.06 Apr 13, 2022

shwina mentioned this pull request Jun 2, 2022

[DOC] RAPIDS 22.06 Release Blog Outline #10878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cudf.DataFrame.applymap` #10542

Add `cudf.DataFrame.applymap` #10542

brandon-b-miller commented Mar 30, 2022

bdice left a comment

bdice Mar 30, 2022

bdice Mar 30, 2022

codecov bot commented Apr 6, 2022 •

edited

Loading

bdice left a comment

bdice Apr 7, 2022

brandon-b-miller Apr 8, 2022

brandon-b-miller commented Apr 13, 2022

brandon-b-miller commented Apr 13, 2022

galipremsagar Apr 13, 2022

galipremsagar Apr 13, 2022

galipremsagar Apr 13, 2022

	If ``ignore``, propagate NaN values, without passing them to func.
	If 'ignore', propagate NaN values, without passing them to func.

		@@ -3718,6 +3720,68 @@ def apply(

		return self._apply(func, _get_row_kernel, args, *kwargs)

		def applymap(


		from dask import dataframe as dd

		from .utils import _make_random_frame

		@@ -8,6 +10,8 @@

		import cudf

		from .utils import _make_random_frame

Add cudf.DataFrame.applymap #10542

Add cudf.DataFrame.applymap #10542

Conversation

brandon-b-miller commented Mar 30, 2022

bdice left a comment

Choose a reason for hiding this comment

bdice Mar 30, 2022

Choose a reason for hiding this comment

bdice Mar 30, 2022

Choose a reason for hiding this comment

codecov bot commented Apr 6, 2022 • edited Loading

Codecov Report

bdice left a comment

Choose a reason for hiding this comment

bdice Apr 7, 2022

Choose a reason for hiding this comment

brandon-b-miller Apr 8, 2022

Choose a reason for hiding this comment

brandon-b-miller commented Apr 13, 2022

brandon-b-miller commented Apr 13, 2022

galipremsagar Apr 13, 2022

Choose a reason for hiding this comment

galipremsagar Apr 13, 2022

Choose a reason for hiding this comment

galipremsagar Apr 13, 2022

Choose a reason for hiding this comment

Add `cudf.DataFrame.applymap` #10542

Add `cudf.DataFrame.applymap` #10542

codecov bot commented Apr 6, 2022 •

edited

Loading