Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Concat of List column with scalar column results in issubclass TypeError #12083

Closed
oliverholworthy opened this issue Nov 7, 2022 · 1 comment · Fixed by #12537
Closed
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@oliverholworthy
Copy link
Member

Describe the bug

Calling cudf.concat with a Series with a list dtype cudf.ListDtype("int64") and scalar dtype np.dtype("int64") raises TypeError.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

cudf.concat([cudf.Series([1, 2]), cudf.Series([[1, 2], [3, 4]])], axis=0, ignore_index=True)
# => Raises TypeError: issubclass() arg 1 must be a class

This is because the cudf.utils.dtypes.find_common_type function is called. Inside find_common_type the pandas function pd.api.types.is_timedelta64_dtype is called with the cudf.ListDtype("int64") which only works with numpy dtypes and not the cudf list types.

import cudf
from cudf.utils.dtypes import find_common_type
import numpy as np

find_common_type({cudf.ListDtype("int64"), np.dtype("int64")})
# Raises TypeError: issubclass() arg 1 must be a class

Expected behavior
A clear and concise description of what you expected to happen.

Either:

  • catch case when concat is called with unsupported types and raise clear error message
  • behaviour similar to pandas, which supports concat list and scalar values
 pd.concat([pd.Series([1, 2]), pd.Series([[1, 2], [3, 4]])], axis=0, ignore_index=True)
# =>
0         1
1         2
2    [1, 2]
3    [3, 4]
dtype: object

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used
docker run -it --gpus=all --rm nvcr.io/nvidia/rapidsai/rapidsai:22.10-cuda11.5-runtime-ubuntu20.04-py3.8 bash

Additional context
Add any other context about the problem here.

@oliverholworthy oliverholworthy added Needs Triage Need team to review and classify bug Something isn't working labels Nov 7, 2022
@galipremsagar galipremsagar added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Nov 7, 2022
@oliverholworthy
Copy link
Member Author

oliverholworthy commented Nov 10, 2022

Found a more minimal example that also results in this error:

import cudf

 cudf.DataFrame({"a": [[1]]}).values
# Raises TypeError: issubclass() arg 1 must be a class

@galipremsagar galipremsagar self-assigned this Jan 4, 2023
rapids-bot bot pushed a commit that referenced this issue Mar 23, 2023
Fixes: #12083, fixes #12115 
This PR fixes `find_common_dtype` and `values` APIs to handle complex dtypes by raising an error instead of casting them to strings.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #12537
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants