-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix dataframe setitem with ndarray
types
#10056
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love for our setitem implementations to not need so many special cases, but I addressing that is probably out of scope for now. I don't see a much more elegant way to resolve this that doesn't also do extra work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 small question + random rants. Overall it's good and shouldn't block it from merging.
@@ -1123,7 +1123,15 @@ def __setitem__(self, arg, value): | |||
for col_name in self._data: | |||
self._data[col_name][mask] = value | |||
else: | |||
if isinstance(value, DataFrame): | |||
if isinstance(value, (cupy.ndarray, np.ndarray)): | |||
_setitem_with_dataframe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the array was being converted to dataframe in order to make use of _setitem_with_dataframe
. Ideally I wish we can separate the logic of column selection from this helper and directly replace the columns targeted. But that's out of the scope for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, it would be nice to just convert the array to a column without having to deal with all the DataFrame junk but I don't see an easy way to do that at present :(
if isinstance(value, (cupy.ndarray, np.ndarray)): | ||
_setitem_with_dataframe( | ||
input_df=self, | ||
replace_df=cudf.DataFrame(value), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can make use of factory method Dataframe._from_data
here instead of public ctor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check this but realized we have the logic that handles ndarray
's in DataFrame
constructor and will need this type of inputs to go through that code-paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_from_data
assumes that the inputs are already columns and it won't call as_column
. We could manually call that function and construct a suitable data
dict, but again this feels like grounds for a future refactor rather than something to do now. At some point we should review all of cudf to see where we construct Frames via constructor rather than using one of the faster factories, I think many if not most of them can be rewritten.
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #10056 +/- ##
================================================
- Coverage 10.49% 10.41% -0.08%
================================================
Files 119 119
Lines 20305 20540 +235
================================================
+ Hits 2130 2139 +9
- Misses 18175 18401 +226
Continue to review full report at Codecov.
|
@gpucibot merge |
@gpucibot merge |
Fixes: #9928
This PR fixes 2d array assignment in
setitem