Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Reindexing on duplicate values should error via constructors #14398

Closed
galipremsagar opened this issue Nov 13, 2023 · 0 comments · Fixed by #14400
Closed

[BUG] Reindexing on duplicate values should error via constructors #14398

galipremsagar opened this issue Nov 13, 2023 · 0 comments · Fixed by #14400
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
When there are duplicate values in an index, Series constructor shouldn't be reindexing if there is a new index passed.

Steps/Code to reproduce bug

In [1]: import cudf

In [2]: s = cudf.Series(['a', 'b', 'c', 'd'], index=[0, 0, 0, 0])

In [3]: s
Out[3]: 
0    a
0    b
0    c
0    d
dtype: object

In [4]: s = cudf.Series(s, index=[10, 11, 12, 13])

In [5]: s
Out[5]: 
10    <NA>
11    <NA>
12    <NA>
13    <NA>
dtype: object

Expected behavior
Raise an error as pandas does:

In [6]: import pandas as pd

In [7]: ps = pd.Series(['a', 'b', 'c', 'd'], index=[0, 0, 0, 0])

In [8]: ps = pd.Series(ps, index=[10, 11, 12, 13])
<ipython-input-8-491d9281f1fc>:1: FutureWarning: reindexing with a non-unique Index is deprecated and will raise in a future version.
  ps = pd.Series(ps, index=[10, 11, 12, 13])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 ps = pd.Series(ps, index=[10, 11, 12, 13])

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/series.py:432, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    430     index = data.index
    431 else:
--> 432     data = data.reindex(index, copy=copy)
    433     copy = False
    434 data = data._mgr

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/series.py:5094, in Series.reindex(self, *args, **kwargs)
   5090         raise TypeError(
   5091             "'index' passed as both positional and keyword argument"
   5092         )
   5093     kwargs.update({"index": index})
-> 5094 return super().reindex(**kwargs)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/generic.py:5289, in NDFrame.reindex(self, *args, **kwargs)
   5286     return self._reindex_multi(axes, copy, fill_value)
   5288 # perform the reindex on the axes
-> 5289 return self._reindex_axes(
   5290     axes, level, limit, tolerance, method, fill_value, copy
   5291 ).__finalize__(self, method="reindex")

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/generic.py:5309, in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   5304 new_index, indexer = ax.reindex(
   5305     labels, level=level, limit=limit, tolerance=tolerance, method=method
   5306 )
   5308 axis = self._get_axis_number(a)
-> 5309 obj = obj._reindex_with_indexers(
   5310     {axis: [new_index, indexer]},
   5311     fill_value=fill_value,
   5312     copy=copy,
   5313     allow_dups=False,
   5314 )
   5315 # If we've made a copy once, no need to make another one
   5316 copy = False

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/generic.py:5355, in NDFrame._reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   5352     indexer = ensure_platform_int(indexer)
   5354 # TODO: speed up on homogeneous DataFrame objects (see _reindex_multi)
-> 5355 new_data = new_data.reindex_indexer(
   5356     index,
   5357     indexer,
   5358     axis=baxis,
   5359     fill_value=fill_value,
   5360     allow_dups=allow_dups,
   5361     copy=copy,
   5362 )
   5363 # If we've made a copy once, no need to make another one
   5364 copy = False

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/internals/managers.py:737, in BaseBlockManager.reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, only_slice, use_na_proxy)
    735 # some axes don't allow reindexing with dups
    736 if not allow_dups:
--> 737     self.axes[axis]._validate_can_reindex(indexer)
    739 if axis >= self.ndim:
    740     raise IndexError("Requested axis not found in manager")

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/indexes/base.py:4316, in Index._validate_can_reindex(self, indexer)
   4314 # trying to reindex on an axis with duplicates
   4315 if not self._index_as_unique and len(indexer):
-> 4316     raise ValueError("cannot reindex on an axis with duplicate labels")

ValueError: cannot reindex on an axis with duplicate labels

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Nov 13, 2023
@galipremsagar galipremsagar self-assigned this Nov 13, 2023
rapids-bot bot pushed a commit that referenced this issue Nov 15, 2023
Fixes: #14398 
This PR raises an error in `reindex` API when reindexing is performed on a non-unique index column.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #14400
galipremsagar added a commit to galipremsagar/cudf that referenced this issue Nov 16, 2023
Fixes: rapidsai#14398 
This PR raises an error in `reindex` API when reindexing is performed on a non-unique index column.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Lawrence Mitchell (https://github.com/wence-)

URL: rapidsai#14400
raydouglass pushed a commit that referenced this issue Nov 16, 2023
Bacport of #14400 
Fixes: #14398 
This PR raises an error in `reindex` API when reindexing is performed on a non-unique index column.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #14400

Authors:
   - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
   - Richard (Rick) Zamora (https://github.com/rjzamora)
   - Ashwin Srinath (https://github.com/shwina)
   - Ray Douglass (https://github.com/raydouglass)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant