Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]User defined functions are currently not supported on Series with dtypes str and category #10722

Closed
WonderingWJ opened this issue Apr 24, 2022 · 8 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. numba Numba issue Python Affects Python cuDF API.

Comments

@WonderingWJ
Copy link

Describe the bug
In cudf 21.12.00a+293.g0930f712e6, there is error log TypeError: User defined functions are currently not supported on Series with dtypes str and category. But in cudf 21.08.03, the code below can successfully run.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

df_total['shift_timestamp'] = df_total.groupby('passenger_id')['bubble_timestamp'].shift(1).to_arrow()
def kernel(bubble_timestamp,shift_timestamp, second_diff,):
    for i, (x, y) in enumerate(zip(bubble_timestamp, shift_timestamp)):
            second_diff[i] = x-y
df_total=df_total.apply_rows(kernel
                        ,incols=["bubble_timestamp","shift_timestamp"]
                        ,outcols=dict(second_diff=np.int64)
                        ,kwargs={}
                       )
df_total['second_diff'].fillna(0, inplace=True)

Expected behavior
Successfully run

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • from source

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=20.04
     DISTRIB_CODENAME=focal
     DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
     NAME="Ubuntu"
     VERSION="20.04.3 LTS (Focal Fossa)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 20.04.3 LTS"
     VERSION_ID="20.04"
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     VERSION_CODENAME=focal
     UBUNTU_CODENAME=focal
     Linux 1d14d3e7c968 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/
Linux

     ***GPU Information***
     Sun Apr 24 07:24:15 2022
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.6     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                               |                      |               MIG M. |
     |===============================+======================+======================|
     |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
     | N/A   31C    P0    42W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
     | N/A   33C    P0    43W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
     | N/A   32C    P0    42W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
     | N/A   30C    P0    42W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
     | N/A   31C    P0    43W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
     | N/A   33C    P0    43W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
     | N/A   34C    P0    44W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     |   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
     | N/A   32C    P0    42W / 300W |      3MiB / 32510MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                                  |
     |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
     |        ID   ID                                                   Usage      |
     |=============================================================================|
     |  No running processes found                                                 |
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:                    x86_64
     CPU op-mode(s):                  32-bit, 64-bit
     Byte Order:                      Little Endian
     Address sizes:                   46 bits physical, 48 bits virtual
     CPU(s):                          80
     On-line CPU(s) list:             0-79
     Thread(s) per core:              2
     Core(s) per socket:              20
     Socket(s):                       2
     NUMA node(s):                    2
     Vendor ID:                       GenuineIntel
     CPU family:                      6
     Model:                           79
     Model name:                      Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
     Stepping:                        1
     CPU MHz:                         3443.694
     CPU max MHz:                     3600.0000
     CPU min MHz:                     1200.0000
     BogoMIPS:                        4390.07
     Virtualization:                  VT-x
     L1d cache:                       1.3 MiB
     L1i cache:                       1.3 MiB
     L2 cache:                        10 MiB
     L3 cache:                        100 MiB
     NUMA node0 CPU(s):               0-19,40-59
     NUMA node1 CPU(s):               20-39,60-79
     Vulnerability Itlb multihit:     KVM: Vulnerable
     Vulnerability L1tf:              Mitigation; PTE Inversion; VMX vulnerable
     Vulnerability Mds:               Vulnerable; SMT vulnerable
     Vulnerability Meltdown:          Vulnerable
     Vulnerability Spec store bypass: Vulnerable
     Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no s
wapgs barriers
     Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled
     Vulnerability Tsx async abort:   Vulnerable

Additional context
Add any other context about the problem here.

@WonderingWJ WonderingWJ added Needs Triage Need team to review and classify bug Something isn't working labels Apr 24, 2022
@davidwendt davidwendt assigned davidwendt and unassigned davidwendt Apr 25, 2022
@brandon-b-miller
Copy link
Contributor

brandon-b-miller commented Apr 25, 2022

Hi @WonderingWJ , can you provide some information about df_total? In particular the dtypes of the columns, which can be found using print(df_total.dtypes).

@WonderingWJ
Copy link
Author

There are a lot of columns in df_total, output of print(df_total.dtypes)

passenger_id                  object
bubbling_id                   object
bubble_time           datetime64[us]
is_send                        int64
is_finish                     object
                           ...
bubble_minute                  int16
bubble_second                  int16
bubble_time_period             int64
minute_to_period               int64
is_workday                     int64
Length: 92, dtype: object

Which column's dtype you want to know ?

@brandon-b-miller
Copy link
Contributor

Thanks @WonderingWJ , could you provide the dtype of the column named 'bubble_timestamp'?

@WonderingWJ
Copy link
Author

int64

@brandon-b-miller
Copy link
Contributor

Thanks - looking into this now.

@github-actions
Copy link

github-actions bot commented Jun 1, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@GregoryKimball GregoryKimball added this to the UDF Enhancements milestone Jun 28, 2022
@GregoryKimball GregoryKimball added feature request New feature or request numba Numba issue libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. and removed bug Something isn't working Needs Triage Need team to review and classify labels Jun 28, 2022
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vyasr
Copy link
Contributor

vyasr commented May 17, 2024

Without the original data it is hard to say for sure, but as best as I can tell the original snippet no longer fails using some toy data that matches the data types discussed above:

import cudf
import numpy as np

df_total = cudf.DataFrame({
    'passenger_id': [1, 1, 1, 2, 2, 2],
    'bubble_timestamp': [1, 2, 3, 1, 2, 3],
    'shift_timestamp': [0, 1, 2, 0, 1, 2],
})

df_total['shift_timestamp'] = df_total.groupby('passenger_id')['bubble_timestamp'].shift(1).to_arrow()

def kernel(bubble_timestamp,shift_timestamp, second_diff,):
    for i, (x, y) in enumerate(zip(bubble_timestamp, shift_timestamp)):
        second_diff[i] = x-y

df_total=df_total.apply_rows(kernel
                        ,incols=["bubble_timestamp","shift_timestamp"]
                        ,outcols=dict(second_diff=np.int64)
                        ,kwargs={}
                       )
df_total['second_diff'].fillna(0, inplace=True)

The broader issue raised in the title (UDFs not supporting some types) is being tracked more holistically in other issues (such as #9639).

@vyasr vyasr closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. numba Numba issue Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

5 participants