Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type casting in Series.__setitem__ #11904

Merged
merged 10 commits into from
Nov 4, 2022

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Oct 11, 2022

Description

To mimic pandas, we must upcast a column to the numpy result_type of the column itself and the input value dtype. This previously occurred in all relevant cases except when the index provided to setitem was a single integer (originally introduced in #2442). Closes #11901.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

To mimic pandas, we must upcast a column to the numpy result_type of
the column itself and the input value dtype. This previously occurred
in all relevant cases except when the index provided to __setitem__
was a single integer (originally introduced in rapidsai#2442). Closes rapidsai#11901.
@wence- wence- requested a review from a team as a code owner October 11, 2022 17:34
@github-actions github-actions bot added the Python Affects Python cuDF API. label Oct 11, 2022
@wence- wence- added non-breaking Non-breaking change pandas 3 - Ready for Review Ready for review by team labels Oct 11, 2022
@wence-
Copy link
Contributor Author

wence- commented Oct 11, 2022

I have marked as non-breaking; technically this changes user-facing behaviour so one could argue that it is a breaking change (although I hope no-one was relying on the previous behaviour).

@galipremsagar galipremsagar added the bug Something isn't working label Oct 11, 2022
@galipremsagar
Copy link
Contributor

rerun tests

Only string and non-decimal numeric columns should try and up-cast.
@wence-
Copy link
Contributor Author

wence- commented Oct 12, 2022

Fixed the test fail 🤞

@codecov
Copy link

codecov bot commented Oct 12, 2022

Codecov Report

Base: 87.40% // Head: 88.06% // Increases project coverage by +0.65% 🎉

Coverage data is based on head (22e3880) compared to base (f72c4ce).
Patch coverage: 86.55% of modified lines in pull request are covered.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #11904      +/-   ##
================================================
+ Coverage         87.40%   88.06%   +0.65%     
================================================
  Files               133      135       +2     
  Lines             21833    22003     +170     
================================================
+ Hits              19084    19376     +292     
+ Misses             2749     2627     -122     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/column.py 87.96% <ø> (-0.46%) ⬇️
python/cudf/cudf/core/udf/__init__.py 100.00% <ø> (+50.00%) ⬆️
python/cudf/cudf/io/orc.py 92.94% <ø> (-0.09%) ⬇️
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 18.86% <ø> (+4.94%) ⬆️
python/cudf/cudf/core/_base_index.py 81.28% <43.75%> (-4.27%) ⬇️
python/cudf/cudf/io/text.py 91.66% <50.00%> (-8.34%) ⬇️
python/strings_udf/strings_udf/__init__.py 84.31% <78.26%> (-12.57%) ⬇️
python/cudf/cudf/core/index.py 92.88% <98.24%> (+0.24%) ⬆️
python/cudf/cudf/__init__.py 90.69% <100.00%> (ø)
python/cudf/cudf/core/column/categorical.py 89.34% <100.00%> (ø)
... and 52 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

python/cudf/cudf/core/series.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/series.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/series.py Show resolved Hide resolved
python/cudf/cudf/tests/test_setitem.py Outdated Show resolved Hide resolved
If the desired dtype is the same as ours, we can just return
ourselves. Surely no-one is relying on foo.astype(foo.dtype) being a
copy.
@wence-
Copy link
Contributor Author

wence- commented Oct 13, 2022

I think to step back a bit here, I think we need to figure out how much of pandas behaviour we are going to mimic here. This PR contains the approximately minimal set of changes to allow setting a single entry in a numeric column with a value, but there are other things that will still not work correctly.

I can try and collate all of those and then fix in this PR, but it is probably better to collate the missing cases, decide which we want to support and then fix in a followup.

@wence-
Copy link
Contributor Author

wence- commented Oct 18, 2022

@bdice any strong opinion one way or the other about how much of this thread I unravel in this PR as opposed to doing the smaller thing here and then going round again? My +ε preference is to do the minimal thing here (i.e. not changing behaviour over and above enabling __setitem__ for integer keys) and then make a breaking change that fixes up-casting for (at least) struct columns to be inline with the up-casting of their components.

@bdice
Copy link
Contributor

bdice commented Oct 18, 2022

@wence- I'm happy to make incremental changes. No need to fix everything at once.

python/cudf/cudf/tests/test_setitem.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/series.py Outdated Show resolved Hide resolved
@wence-
Copy link
Contributor Author

wence- commented Oct 21, 2022

@bdice I think this is now good to go if you have time for a final look.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the iterations on this and for follow-up issue documentation.

python/cudf/cudf/tests/test_setitem.py Outdated Show resolved Hide resolved
Co-authored-by: Bradley Dice <[email protected]>
@wence-
Copy link
Contributor Author

wence- commented Nov 3, 2022

@gpucibot merge

@wence-
Copy link
Contributor Author

wence- commented Nov 4, 2022

Waiting on #12067

@vyasr
Copy link
Contributor

vyasr commented Nov 4, 2022

rerun tests

@rapids-bot rapids-bot bot merged commit 11b875b into rapidsai:branch-22.12 Nov 4, 2022
@wence- wence- deleted the wence/fix/setitem-singleton branch November 5, 2022 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] mismatch in up-casting Series.__setitem__ between pandas and cudf
4 participants