Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support passing a Series to list.index() #10692

Closed
randerzander opened this issue Apr 20, 2022 · 0 comments · Fixed by #10696
Closed

[FEA] Support passing a Series to list.index() #10692

randerzander opened this issue Apr 20, 2022 · 0 comments · Fixed by #10696
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@randerzander
Copy link
Contributor

Related to #10687

I'm trying to find the indexes of elements within a list column where the value I'm searching for is different for reach row of the DF.

I'd like to be able to pass a cudf Series into list.index().

Example:

import cudf

df = cudf.DataFrame({'overloaded_field':
    ['0,1,2,0', '2,3,0,2,3']
})

# "second max" value within each list
smax = df['overloaded_field'].str.split(',').list.sort_values().list.unique().list.get(-2)
smax
0    1
1    2
Name: overloaded_field, dtype: object

# now trying to find the list index of each smax value
df['overloaded_field'].str.split(',').list.index(smax)

Trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [84], in <cell line: 1>()
----> 1 df['overloaded_field'].str.split(',').list.index(smax)

File ~/conda/envs/dsql-4-18/lib/python3.9/site-packages/cudf/core/column/lists.py:459, in ListMethods.index(self, search_key)
    458 def index(self, search_key: ScalarLike) -> ParentType:
--> 459     search_key = cudf.Scalar(search_key)
    460     try:
    461         res = self._return_or_inplace(index_of(self._column, search_key))

File ~/conda/envs/dsql-4-18/lib/python3.9/site-packages/cudf/core/scalar.py:78, in Scalar.__init__(self, value, dtype)
     76     self._device_value = value
     77 else:
---> 78     self._host_value, self._host_dtype = self._preprocess_host_value(
     79         value, dtype
     80     )

File ~/conda/envs/dsql-4-18/lib/python3.9/site-packages/cudf/core/scalar.py:159, in Scalar._preprocess_host_value(self, value, dtype)
    156 if isinstance(value, decimal.Decimal) and dtype is None:
    157     dtype = cudf.Decimal128Dtype._from_decimal(value)
--> 159 value = to_cudf_compatible_scalar(value, dtype=dtype)
    161 if dtype is None:
    162     if not valid:

File ~/conda/envs/dsql-4-18/lib/python3.9/site-packages/cudf/utils/dtypes.py:246, in to_cudf_compatible_scalar(val, dtype)
    243     return val
    245 if not cudf.api.types._is_scalar_or_zero_d_array(val):
--> 246     raise ValueError(
    247         f"Cannot convert value of type {type(val).__name__} "
    248         "to cudf scalar"
    249     )
    251 if isinstance(val, Decimal):
    252     return val

ValueError: Cannot convert value of type Series to cudf scalar
@randerzander randerzander added feature request New feature or request Needs Triage Need team to review and classify labels Apr 20, 2022
@bdice bdice added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Apr 21, 2022
rapids-bot bot pushed a commit that referenced this issue Apr 27, 2022
This adds bindings for `index_of` to enable using `list.index` with a Series of search keys.

Closes #10692 

cc: @randerzander

Authors:
  - https://github.com/ChrisJar

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Bradley Dice (https://github.com/bdice)

URL: #10696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants