Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Index.intersection throwing error on empty inputs #14020

Closed
galipremsagar opened this issue Aug 31, 2023 · 0 comments · Fixed by #14054
Closed

[BUG] Index.intersection throwing error on empty inputs #14020

galipremsagar opened this issue Aug 31, 2023 · 0 comments · Fixed by #14054
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
When there is an empty list input, Index.intersection is throwing an error.

Steps/Code to reproduce bug

In [1]: import cudf

In [2]: s = cudf.Index(['a', 'b', 'c'], name="abc")

In [5]: s.intersection([])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 s.intersection([])

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/_base_index.py:643, in BaseIndex.intersection(self, other, sort)
    641 else:
    642     rhs = other
--> 643 result = lhs._intersection(rhs, sort=sort)
    644 result.name = res_name
    645 return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/_base_index.py:1330, in BaseIndex._intersection(self, other, sort)
   1328 def _intersection(self, other, sort=None):
   1329     intersection_result = cudf.core.index._index_from_data(
-> 1330         cudf.DataFrame._from_data({"None": self.unique()._column})
   1331         .merge(
   1332             cudf.DataFrame._from_data({"None": other.unique()._column}),
   1333             how="inner",
   1334             on="None",
   1335         )
   1336         ._data
   1337     )
   1339     if sort is None and len(other):
   1340         return intersection_result.sort_values()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:101, in annotate.__call__.<locals>.inner(*args, **kwargs)
     98 @wraps(func)
     99 def inner(*args, **kwargs):
    100     libnvtx_push_range(self.attributes, self.domain.handle)
--> 101     result = func(*args, **kwargs)
    102     libnvtx_pop_range(self.domain.handle)
    103     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/dataframe.py:3988, in DataFrame.merge(self, right, on, left_on, right_on, left_index, right_index, how, sort, lsuffix, rsuffix, indicator, suffixes)
   3973 elif how in {"leftsemi", "leftanti"}:
   3974     merge_cls = MergeSemi
   3976 return merge_cls(
   3977     lhs,
   3978     rhs,
   3979     on=on,
   3980     left_on=left_on,
   3981     right_on=right_on,
   3982     left_index=left_index,
   3983     right_index=right_index,
   3984     how=how,
   3985     sort=sort,
   3986     indicator=indicator,
   3987     suffixes=suffixes,
-> 3988 ).perform_merge()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/join/join.py:170, in Merge.perform_merge(self)
    168 lcol = left_key.get(self.lhs)
    169 rcol = right_key.get(self.rhs)
--> 170 lcol_casted, rcol_casted = _match_join_keys(lcol, rcol, self.how)
    171 left_join_cols.append(lcol_casted)
    172 right_join_cols.append(rcol_casted)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/join/_join_helpers.py:129, in _match_join_keys(lcol, rcol, how)
    126 if how == "left" and rcol.fillna(0).can_cast_safely(ltype):
    127     return lcol, rcol.astype(ltype)
--> 129 return lcol.astype(common_type), rcol.astype(common_type)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:1021, in ColumnBase.astype(self, dtype, **kwargs)
   1019     return self.as_timedelta_column(dtype, **kwargs)
   1020 else:
-> 1021     return self.as_numerical_column(dtype, **kwargs)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/string.py:5652, in StringColumn.as_numerical_column(self, dtype, **kwargs)
   5650 elif out_dtype.kind == "f":
   5651     if not libstrings.is_float(string_col).all():
-> 5652         raise ValueError(
   5653             "Could not convert strings to float "
   5654             "type due to presence of non-floating values."
   5655         )
   5657 result_col = _str_to_numeric_typecast_functions[out_dtype](string_col)
   5658 return result_col

ValueError: Could not convert strings to float type due to presence of non-floating values.

In [6]: s.to_pandas().intersection([])
Out[6]: Index([], dtype='object', name='abc')

Expected behavior

In [6]: s.intersection([])
Out[6]: Index([], dtype='object', name='abc')

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Aug 31, 2023
@galipremsagar galipremsagar self-assigned this Aug 31, 2023
rapids-bot bot pushed a commit that referenced this issue Sep 12, 2023
This PR fixes multiple issues with `Index.intersection`:

- [x] Fixes issues with handling empty inputs, closes #14020
- [x] Adds validation for inputs.
- [x] Properly handles various types in `intersection` implementation and fix `RangeIndex.intersection` by having a separate implementation for it.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #14054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant