[REVIEW] Fix dataframe setitem with `ndarray` types #10056

galipremsagar · 2022-01-14T17:29:18Z

This PR fixes 2d array assignment in setitem

vyasr

I'd love for our setitem implementations to not need so many special cases, but I addressing that is probably out of scope for now. I don't see a much more elegant way to resolve this that doesn't also do extra work.

isVoid

1 small question + random rants. Overall it's good and shouldn't block it from merging.

isVoid · 2022-01-14T18:06:05Z

python/cudf/cudf/core/dataframe.py

@@ -1123,7 +1123,15 @@ def __setitem__(self, arg, value):
                    for col_name in self._data:
                        self._data[col_name][mask] = value
            else:
-                if isinstance(value, DataFrame):
+                if isinstance(value, (cupy.ndarray, np.ndarray)):
+                    _setitem_with_dataframe(


I see the array was being converted to dataframe in order to make use of _setitem_with_dataframe. Ideally I wish we can separate the logic of column selection from this helper and directly replace the columns targeted. But that's out of the scope for sure.

Yeah I agree, it would be nice to just convert the array to a column without having to deal with all the DataFrame junk but I don't see an easy way to do that at present :(

isVoid · 2022-01-14T18:06:36Z

python/cudf/cudf/core/dataframe.py

+                if isinstance(value, (cupy.ndarray, np.ndarray)):
+                    _setitem_with_dataframe(
+                        input_df=self,
+                        replace_df=cudf.DataFrame(value),


Any chance we can make use of factory method Dataframe._from_data here instead of public ctor?

I did check this but realized we have the logic that handles ndarray's in DataFrame constructor and will need this type of inputs to go through that code-paths.

_from_data assumes that the inputs are already columns and it won't call as_column. We could manually call that function and construct a suitable data dict, but again this feels like grounds for a future refactor rather than something to do now. At some point we should review all of cudf to see where we construct Frames via constructor rather than using one of the faster factories, I think many if not most of them can be rewritten.

codecov · 2022-01-14T22:43:52Z

Codecov Report

Merging #10056 (e6f09c2) into branch-22.02 (967a333) will decrease coverage by 0.07%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.02   #10056      +/-   ##
================================================
- Coverage         10.49%   10.41%   -0.08%     
================================================
  Files               119      119              
  Lines             20305    20540     +235     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18401     +226

Impacted Files	Coverage Δ
python/custreamz/custreamz/kafka.py	`29.16% <0.00%> (-0.63%)`	⬇️
python/dask_cudf/dask_cudf/sorting.py	`92.66% <0.00%> (-0.25%)`	⬇️
python/dask_cudf/dask_cudf/core.py	`70.85% <0.00%> (-0.17%)`	⬇️
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/parquet.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/series.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/utils.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/utils/dtypes.py	`0.00% <0.00%> (ø)`
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12adb8a...e6f09c2. Read the comment docs.

galipremsagar · 2022-01-14T23:21:33Z

@gpucibot merge

galipremsagar · 2022-01-14T23:22:05Z

@gpucibot merge

fix dataframe setitem

09abc44

galipremsagar added bug Something isn't working Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jan 14, 2022

galipremsagar requested a review from a team as a code owner January 14, 2022 17:29

galipremsagar self-assigned this Jan 14, 2022

galipremsagar requested review from isVoid and charlesbluca January 14, 2022 17:29

galipremsagar added the 3 - Ready for Review Ready for review by team label Jan 14, 2022

vyasr approved these changes Jan 14, 2022

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 14, 2022

isVoid approved these changes Jan 14, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/branch-22.02' into 9928

e6f09c2

galipremsagar removed the request for review from charlesbluca January 14, 2022 23:21

rapids-bot bot merged commit 8c8d6ef into rapidsai:branch-22.02 Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Fix dataframe setitem with `ndarray` types #10056

[REVIEW] Fix dataframe setitem with `ndarray` types #10056

galipremsagar commented Jan 14, 2022

vyasr left a comment

isVoid left a comment

isVoid Jan 14, 2022

vyasr Jan 14, 2022

isVoid Jan 14, 2022

galipremsagar Jan 14, 2022

vyasr Jan 14, 2022

codecov bot commented Jan 14, 2022 •

edited

Loading

galipremsagar commented Jan 14, 2022

galipremsagar commented Jan 14, 2022

[REVIEW] Fix dataframe setitem with ndarray types #10056

[REVIEW] Fix dataframe setitem with ndarray types #10056

Conversation

galipremsagar commented Jan 14, 2022

vyasr left a comment

Choose a reason for hiding this comment

isVoid left a comment

Choose a reason for hiding this comment

isVoid Jan 14, 2022

Choose a reason for hiding this comment

vyasr Jan 14, 2022

Choose a reason for hiding this comment

isVoid Jan 14, 2022

Choose a reason for hiding this comment

galipremsagar Jan 14, 2022

Choose a reason for hiding this comment

vyasr Jan 14, 2022

Choose a reason for hiding this comment

codecov bot commented Jan 14, 2022 • edited Loading

Codecov Report

galipremsagar commented Jan 14, 2022

galipremsagar commented Jan 14, 2022

[REVIEW] Fix dataframe setitem with `ndarray` types #10056

[REVIEW] Fix dataframe setitem with `ndarray` types #10056

codecov bot commented Jan 14, 2022 •

edited

Loading