Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DataFrame.reindex behavior differs from pandas #9827

Closed
Tracked by #9815
bdice opened this issue Dec 2, 2021 · 2 comments
Closed
Tracked by #9815

[BUG] DataFrame.reindex behavior differs from pandas #9827

bdice opened this issue Dec 2, 2021 · 2 comments
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@bdice
Copy link
Contributor

bdice commented Dec 2, 2021

Describe the bug
The behavior of DataFrame.reindex fails doctests and differs from pandas. From a preliminary discussion in our dev meeting, we may have the behavior that we intend already in place -- this issue can be closed if we choose to keep the current implementation and update the doctests in #9815 to match.

Steps/Code to reproduce bug
This doctest (#9815) fails:

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]
>>> df_new = df.reindex(index=[0, 3, 4, 5],
... columns=['key', 'val', 'sum'])
>>> df
key val
0 0 10.0
1 1 11.0
2 2 12.0
3 3 13.0
4 4 14.0
>>> df_new
key val sum
0 0 10.0 NaN
3 3 13.0 NaN
4 4 14.0 NaN
5 -1 NaN NaN

Below is the doctest output:

File "/home/bdice/code/cudf/python/cudf/cudf/core/dataframe.py", line 2289, in DataFrame.reindex
Failed example:
    df_new
Expected:
       key   val  sum
    0    0  10.0  NaN
    3    3  13.0  NaN
    4    4  14.0  NaN
    5   -1   NaN  NaN
Got:
        key   val   sum
    0     0  10.0  <NA>
    3     3  13.0  <NA>
    4     4  14.0  <NA>
    5  <NA>  <NA>  <NA>

Expected behavior

Pandas gives the following output (uses NaN instead of <NA>, and casts the column key to type float64):

>>> pd.DataFrame({'key': [0, 1, 2, 3, 4], 'value': [10.0, 11.0, 12.0, 13.0, 14.0]}).reindex(index=[0, 3, 4, 5], columns=['key', 'value', 'sum'])
   key  value  sum
0  0.0   10.0  NaN
3  3.0   13.0  NaN
4  4.0   14.0  NaN
5  NaN    NaN  NaN
@bdice bdice added bug Something isn't working Python Affects Python cuDF API. labels Dec 2, 2021
@bdice bdice mentioned this issue Dec 2, 2021
9 tasks
@shwina
Copy link
Contributor

shwina commented Dec 2, 2021

I think here we have the right behaviour. All our types are nullable, so we can afford to use <NA> as a null value and avoid casting things to float. We may need a way of telling doctests to not check the result dtypes when necessary (similar to the check_dtype=False argument to assert_frame_equal in pytests).

bdice added a commit to bdice/cudf that referenced this issue Dec 3, 2021
@bdice
Copy link
Contributor Author

bdice commented Dec 3, 2021

@shwina Awesome, thanks for confirming. I have updated the doctest and will close this issue.

We may need a way of telling doctests to not check the result dtypes when necessary (similar to the check_dtype=False argument to assert_frame_equal in pytests).

No additional changes are necessary beyond updating the output of the example in 699c21a. Doctests execute the commands and compare the printed output. In some sense, doctests are a test of the REPL environment and validate the workings of repr / str methods.

@bdice bdice closed this as completed Dec 3, 2021
rapids-bot bot pushed a commit that referenced this issue Jan 15, 2022
This PR adds doctests and resolves #9513. Several issues were found by running doctests that have now been resolved:

- [x] #9821
- [x] #9822
- [x] #9823
- [x] #9824
- [x] #9825
- [x] #9826
- [x] #9827
- [x] #9828 (workaround by deleting doctests)
- [x] #9829

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9815
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

2 participants