Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Passing a list to StringMethods.strip puts cuDF into a broken state #10591

Closed
charlesbluca opened this issue Apr 5, 2022 · 1 comment · Fixed by #10597
Closed

[BUG] Passing a list to StringMethods.strip puts cuDF into a broken state #10591

charlesbluca opened this issue Apr 5, 2022 · 1 comment · Fixed by #10597
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@charlesbluca
Copy link
Member

Describe the bug
When passing a list as input to StringMethods.strip, the operation fails and puts cuDF into a broken state, where it is generally not possible to create anymore frames and accessing created frames causes a segfault.

Steps/Code to reproduce bug

import cudf

s = cudf.Series(["hi."]).str.strip(["."])
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_1765/347004186.py in <module>
      1 import cudf
      2 
----> 3 s = cudf.Series(["hi."]).str.strip(["."])

/opt/conda/envs/rapids/lib/python3.9/site-packages/cudf/core/column/string.py in strip(self, to_strip)
   3185 
   3186         return self._return_or_inplace(
-> 3187             libstrings.strip(self._column, cudf.Scalar(to_strip))
   3188         )
   3189 

cudf/_lib/strings/strip.pyx in cudf._lib.strings.strip.strip()

RuntimeError: for_each: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
print(s)
terminate called after throwing an instance of 'rmm::bad_alloc'
  what():  std::bad_alloc: CUDA error at: /workspace/.conda-bld/work/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorIllegalAddress an illegal memory access was encountered
Aborted (core dumped)

Expected behavior
The equivalent behavior in Pandas:

import pd

s = pd.Series(["hi."]).str.strip(["."])
s
0   NaN
dtype: float64

Environment overview (please complete the following information)

  • Environment location: bare metal
  • Method of cuDF install: conda

Environment details

Click here to see environment details
 **git***

print_env.sh: 10: [: true: unexpected operator
Not inside a git repository

 ***OS Information***
 DGX_NAME="DGX Server"
 DGX_PRETTY_NAME="NVIDIA DGX Server"
 DGX_SWBUILD_DATE="2020-03-04"
 DGX_SWBUILD_VERSION="4.4.0"
 DGX_COMMIT_ID="ee09ebc"
 DGX_PLATFORM="DGX Server for DGX-1"
 DGX_SERIAL_NUMBER="QTFCOU8220028"
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=18.04
 DISTRIB_CODENAME=bionic
 DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
 NAME="Ubuntu"
 VERSION="18.04.4 LTS (Bionic Beaver)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 18.04.4 LTS"
 VERSION_ID="18.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=bionic
 UBUNTU_CODENAME=bionic
 Linux dgx12 4.15.0-1083-oracle #91-Ubuntu SMP Mon Oct 25 06:45:22 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Tue Apr  5 06:14:36 2022
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
 | N/A   33C    P0    55W / 300W |  10135MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
 | N/A   32C    P0    55W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
 | N/A   31C    P0    54W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
 | N/A   30C    P0    54W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
 | N/A   32C    P0    55W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
 | N/A   33C    P0    54W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
 | N/A   35C    P0    56W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 |   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
 | N/A   31C    P0    55W / 300W |    840MiB / 32510MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |    0   N/A  N/A     45143      C   ...da/envs/rapids/bin/python      647MiB |
 |    0   N/A  N/A     52956      C   ...da/envs/rapids/bin/python     2485MiB |
 |    0   N/A  N/A     53857      C   ...da/envs/rapids/bin/python     3031MiB |
 |    0   N/A  N/A     53970      C   ...da/envs/rapids/bin/python     1029MiB |
 |    0   N/A  N/A     54198      C   ...da/envs/rapids/bin/python     1541MiB |
 |    0   N/A  N/A     54367      C   ...da/envs/rapids/bin/python     1391MiB |
 |    1   N/A  N/A     53952      C   ...da/envs/rapids/bin/python      837MiB |
 |    2   N/A  N/A     53972      C   ...da/envs/rapids/bin/python      837MiB |
 |    3   N/A  N/A     53957      C   ...da/envs/rapids/bin/python      837MiB |
 |    4   N/A  N/A     53967      C   ...da/envs/rapids/bin/python      837MiB |
 |    5   N/A  N/A     53961      C   ...da/envs/rapids/bin/python      837MiB |
 |    6   N/A  N/A     53964      C   ...da/envs/rapids/bin/python      837MiB |
 |    7   N/A  N/A     53954      C   ...da/envs/rapids/bin/python      837MiB |
 +-----------------------------------------------------------------------------+

 ***CPU***
 Architecture:        x86_64
 CPU op-mode(s):      32-bit, 64-bit
 Byte Order:          Little Endian
 CPU(s):              80
 On-line CPU(s) list: 0-79
 Thread(s) per core:  2
 Core(s) per socket:  20
 Socket(s):           2
 NUMA node(s):        2
 Vendor ID:           GenuineIntel
 CPU family:          6
 Model:               79
 Model name:          Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
 Stepping:            1
 CPU MHz:             2372.134
 CPU max MHz:         3600.0000
 CPU min MHz:         1200.0000
 BogoMIPS:            4389.96
 Virtualization:      VT-x
 L1d cache:           32K
 L1i cache:           32K
 L2 cache:            256K
 L3 cache:            51200K
 NUMA node0 CPU(s):   0-19,40-59
 NUMA node1 CPU(s):   20-39,60-79
 Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

 ***CMake***
 /usr/bin/cmake
 cmake version 3.10.2

 CMake suite maintained and supported by Kitware (kitware.com/cmake).

 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
 Copyright (C) 2019 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***
 /usr/local/cuda-11.5/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2021 NVIDIA Corporation
 Built on Thu_Nov_18_09:45:30_PST_2021
 Cuda compilation tools, release 11.5, V11.5.119
 Build cuda_11.5.r11.5/compiler.30672275_0

 ***Python***
 /datasets/charlesb/miniconda3/envs/cudf-22.06/bin/python
 Python 3.9.12

 ***Environment Variables***
 PATH                            : /datasets/charlesb/miniconda3/envs/cudf-22.06/bin:/datasets/charlesb/miniconda3/condabin:/home/nfs/charlesb/bin:/usr/local/cuda-11.5/bin:/usr/local/cuda/bin:/opt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
 LD_LIBRARY_PATH                 : /usr/local/cuda-11.5/lib64
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /datasets/charlesb/miniconda3/envs/cudf-22.06
 PYTHON_PATH                     :

 ***conda packages***
 conda is /datasets/charlesb/miniconda3/condabin/conda
 /datasets/charlesb/miniconda3/condabin/conda
 # packages in environment at /datasets/charlesb/miniconda3/envs/cudf-22.06:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                       1_gnu    conda-forge
 abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
 arrow-cpp                 7.0.0           py39hb1e3516_3_cuda    conda-forge
 arrow-cpp-proc            3.0.0                      cuda    conda-forge
 asttokens                 2.0.5              pyhd8ed1ab_0    conda-forge
 aws-c-cal                 0.5.11               h95a6274_0    conda-forge
 aws-c-common              0.6.2                h7f98852_0    conda-forge
 aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
 aws-c-io                  0.10.5               hfb6a706_0    conda-forge
 aws-checksums             0.1.11               ha31a3da_7    conda-forge
 aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
 backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.18.1               h7f98852_0    conda-forge
 ca-certificates           2021.10.8            ha878542_0    conda-forge
 cachetools                5.0.0              pyhd8ed1ab_0    conda-forge
 cuda-python               11.6.1           py39h3fd9d12_0    nvidia
 cudatoolkit               11.5.0               h36ae40a_9    nvidia
 cudf                      22.06.00a220405 cuda_11_py39_g0aef0c1c3e_96    rapidsai-nightly
 cupy                      10.3.0           py39hc3c280e_0    conda-forge
 decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 executing                 0.8.3              pyhd8ed1ab_0    conda-forge
 fastavro                  1.4.10           py39hb9d737c_0    conda-forge
 fastrlock                 0.8              py39h5a03fae_1    conda-forge
 fsspec                    2022.3.0           pyhd8ed1ab_0    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 glog                      0.5.0                h48cff8f_0    conda-forge
 grpc-cpp                  1.43.2               h9e046d8_1    conda-forge
 ipython                   8.2.0            py39hf3d152e_0    conda-forge
 jedi                      0.18.1           py39hf3d152e_1    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 krb5                      1.19.3               h3790be6_0    conda-forge
 ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
 libblas                   3.9.0           13_linux64_openblas    conda-forge
 libbrotlicommon           1.0.9                h166bdaf_7    conda-forge
 libbrotlidec              1.0.9                h166bdaf_7    conda-forge
 libbrotlienc              1.0.9                h166bdaf_7    conda-forge
 libcblas                  3.9.0           13_linux64_openblas    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcudf                   22.06.00a220405 cuda11_g0aef0c1c3e_96    rapidsai-nightly
 libcurl                   7.82.0               h7bff187_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.10               h9b69904_4    conda-forge
 libffi                    3.4.2                h7f98852_5    conda-forge
 libgcc-ng                 11.2.0              h1d223b6_14    conda-forge
 libgfortran-ng            11.2.0              h69a702a_14    conda-forge
 libgfortran5              11.2.0              h5c6108e_14    conda-forge
 libgomp                   11.2.0              h1d223b6_14    conda-forge
 libgoogle-cloud           1.35.0               h6945097_2    conda-forge
 liblapack                 3.9.0           13_linux64_openblas    conda-forge
 libllvm11                 11.1.0               hf817b99_3    conda-forge
 libnghttp2                1.47.0               h727a467_0    conda-forge
 libnsl                    2.0.0                h7f98852_0    conda-forge
 libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
 libprotobuf               3.19.4               h780b84a_0    conda-forge
 librmm                    22.06.00a220405 cuda11_g921d286_22    rapidsai-nightly
 libssh2                   1.10.0               ha56f1ee_2    conda-forge
 libstdcxx-ng              11.2.0              he4da1e4_14    conda-forge
 libthrift                 0.16.0               h519c5ea_1    conda-forge
 libutf8proc               2.7.0                h7f98852_0    conda-forge
 libuuid                   2.32.1            h7f98852_1000    conda-forge
 libzlib                   1.2.11            h166bdaf_1014    conda-forge
 llvmlite                  0.38.0           py39h7d9a04d_1    conda-forge
 lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
 matplotlib-inline         0.1.3              pyhd8ed1ab_0    conda-forge
 ncurses                   6.3                  h9c3ff4c_0    conda-forge
 numba                     0.55.1           py39h56b8d98_0    conda-forge
 numpy                     1.21.5           py39haac66dc_0    conda-forge
 nvtx                      0.2.3            py39h3811e60_1    conda-forge
 openssl                   1.1.1n               h166bdaf_0    conda-forge
 orc                       1.7.3                h1be678f_0    conda-forge
 packaging                 21.3               pyhd8ed1ab_0    conda-forge
 pandas                    1.3.5            py39hde0f152_0    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 parso                     0.8.3              pyhd8ed1ab_0    conda-forge
 pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
 pickleshare               0.7.5                   py_1003    conda-forge
 pip                       22.0.4             pyhd8ed1ab_0    conda-forge
 prompt-toolkit            3.0.29             pyha770c72_0    conda-forge
 protobuf                  3.19.4           py39he80948d_0    conda-forge
 ptxcompiler               0.3.0           cuda_11_py39_geed289a_9    rapidsai-nightly
 ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
 pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
 pyarrow                   7.0.0           py39h1ed2e5d_3_cuda    conda-forge
 pygments                  2.11.2             pyhd8ed1ab_0    conda-forge
 pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
 python                    3.9.12          h9a8a25e_1_cpython    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python_abi                3.9                      2_cp39    conda-forge
 pytz                      2022.1             pyhd8ed1ab_0    conda-forge
 re2                       2022.02.01           h9c3ff4c_0    conda-forge
 readline                  8.1                  h46c0cb4_0    conda-forge
 rmm                       22.06.00a220405 cuda11_py39_g921d286_22    rapidsai-nightly
 s2n                       1.0.10               h9b69904_0    conda-forge
 setuptools                59.8.0           py39hf3d152e_1    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.1.8                he1b5a44_3    conda-forge
 spdlog                    1.8.5                h4bd325d_1    conda-forge
 sqlite                    3.37.1               h4ff8645_0    conda-forge
 stack_data                0.2.0              pyhd8ed1ab_0    conda-forge
 tk                        8.6.12               h27826a3_0    conda-forge
 traitlets                 5.1.1              pyhd8ed1ab_0    conda-forge
 typing_extensions         4.1.1              pyha770c72_0    conda-forge
 tzdata                    2022a                h191b570_0    conda-forge
 wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
 wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
 xz                        5.2.5                h516909a_1    conda-forge
 zlib                      1.2.11            h166bdaf_1014    conda-forge
 zstd                      1.5.2                ha95c52a_0    conda-forge

@charlesbluca charlesbluca added bug Something isn't working Needs Triage Need team to review and classify labels Apr 5, 2022
@davidwendt
Copy link
Contributor

The to_strip parameter here is expected to only be a str type. Calling cudf.Scalar(to_strip) when to_strip is a list is later incorrectly converting a list-scalar into a string_scalar. I think the right behavior is to throw a TypeError instead of what Pandas does.

@davidwendt davidwendt self-assigned this Apr 5, 2022
@davidwendt davidwendt added the Python Affects Python cuDF API. label Apr 5, 2022
rapids-bot bot pushed a commit that referenced this issue Apr 6, 2022
…10597)

Closes #10591 

Ensures `to_strip` parameter is a `str` type when converting it to `cudf.Scalar`. It will now through a `TypeError` as follows
```
    libstrings.strip(self._column, cudf.Scalar(to_strip, "str"))
  File "/conda/envs/rapids/lib/python3.8/site-packages/cudf-22.6.0a0+96.g0aef0c1c3e.dirty-py3.8-linux-x86_64.egg/cudf/core/scalar.py", line 78, in __init__
    self._host_value, self._host_dtype = self._preprocess_host_value(
  File "/conda/envs/rapids/lib/python3.8/site-packages/cudf-22.6.0a0+96.g0aef0c1c3e.dirty-py3.8-linux-x86_64.egg/cudf/core/scalar.py", line 128, in _preprocess_host_value
    raise TypeError("Lists may not be cast to a different dtype")
TypeError: Lists may not be cast to a different dtype

```
This will also prevent the _sticky_ CUDA error.

Also, added the `str` parameter to other `cudf.Scalar` calls where only strings are supported as well.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10597
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants