Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] replace_with_backrefs hangs with some inputs #13404

Closed
andygrove opened this issue May 22, 2023 · 0 comments · Fixed by #13418
Closed

[BUG] replace_with_backrefs hangs with some inputs #13404

andygrove opened this issue May 22, 2023 · 0 comments · Fixed by #13418
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Contributor

Describe the bug
replace_with_backrefs hangs in some cases.

Steps/Code to reproduce bug

>>> import cudf
>>> cudf.__version__
'23.06.00'
>>> s = cudf.Series(["one\ntwo", "three\n\n"])
>>> s.str.replace_with_backrefs('[^\n\r]*(\r|\r\n)?$', r'scala\1')

Expected behavior
I would expect this to either fail with an error or complete without hanging.

Environment overview (please complete the following information)

  • Environment location: Local workstation
  • Method of cuDF install: conda

Environment details

Click here to see environment details
 **git***
 commit f1e88635c81ecb553957e89fcff83b26b5ff168e (HEAD -> regexp-hang, rapidsai/branch-23.06)
 Author: Lawrence Mitchell <[email protected]>
 Date:   Fri May 19 15:25:54 2023 +0100
 
 Correctly reorder and reindex scan groupbys with null keys (#13389)
 
 Scan-based groupbys are massaged back into pandas (original dataframe)
 order by a post-processing step. Previously, this did the wrong thing
 if the grouping key contained null (or nan) keys. In this situation
 dropna=True will cause libcudf to produce an output table that is
 smaller than the input frame. To mimic pandas we need to expand this
 output to the original frame size, inserting nulls in the missing rows
 and reordering correctly.
 
 Furthermore, the previous reordering code had an out-of-bounds memory
 access when there were null keys, since we were asking to group a
 column of the same length as the result, but the grouping object expects
 columns of length of the original input (which is larger with
 dropna=True and null keys).
 
 To fix these issues, compute the reordering on a column of appropriate
 size, and, if dropna is true and any of the key columns have nulls, go
 down a more expensive reordering path that inserts nulls correctly by
 reindexing the result.
 
 - Closes #13349
 - Closes #12055
 
 Authors:
 - Lawrence Mitchell (https://github.com/wence-)
 
 Approvers:
 - Ashwin Srinath (https://github.com/shwina)
 
 URL: https://github.com/rapidsai/cudf/pull/13389
 **git submodules***
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=22.04
 DISTRIB_CODENAME=jammy
 DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
 PRETTY_NAME="Ubuntu 22.04.2 LTS"
 NAME="Ubuntu"
 VERSION_ID="22.04"
 VERSION="22.04.2 LTS (Jammy Jellyfish)"
 VERSION_CODENAME=jammy
 ID=ubuntu
 ID_LIKE=debian
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 UBUNTU_CODENAME=jammy
 Linux ripper 5.19.0-41-generic #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Mon May 22 08:40:23 2023
 +---------------------------------------------------------------------------------------+
 | NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                                         |                      |               MIG M. |
 |=========================================+======================+======================|
 |   0  NVIDIA GeForce RTX 3080         On | 00000000:42:00.0 Off |                  N/A |
 | 39%   68C    P2              132W / 320W|   2356MiB / 10240MiB |    100%      Default |
 |                                         |                      |                  N/A |
 +-----------------------------------------+----------------------+----------------------+
 
 +---------------------------------------------------------------------------------------+
 | Processes:                                                                            |
 |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
 |        ID   ID                                                             Usage      |
 |=======================================================================================|
 |    0   N/A  N/A    452399      G   /usr/lib/xorg/Xorg                          418MiB |
 |    0   N/A  N/A    452569      G   /usr/bin/gnome-shell                        156MiB |
 |    0   N/A  N/A    454700      G   ./jetbrains-toolbox                          12MiB |
 |    0   N/A  N/A    469789      G   ...irefox/2667/usr/lib/firefox/firefox      448MiB |
 |    0   N/A  N/A    675603      C   python3                                     620MiB |
 |    0   N/A  N/A    675900      G   ...,WinRetrieveSuggestionsOnlyOnDemand       76MiB |
 |    0   N/A  N/A    676964      C   python3                                     618MiB |
 +---------------------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:                    x86_64
 CPU op-mode(s):                  32-bit, 64-bit
 Address sizes:                   43 bits physical, 48 bits virtual
 Byte Order:                      Little Endian
 CPU(s):                          48
 On-line CPU(s) list:             0-47
 Vendor ID:                       AuthenticAMD
 Model name:                      AMD Ryzen Threadripper 2970WX 24-Core Processor
 CPU family:                      23
 Model:                           8
 Thread(s) per core:              2
 Core(s) per socket:              24
 Socket(s):                       1
 Stepping:                        2
 Frequency boost:                 enabled
 CPU max MHz:                     3000.0000
 CPU min MHz:                     2200.0000
 BogoMIPS:                        5988.22
 Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
 Virtualization:                  AMD-V
 L1d cache:                       768 KiB (24 instances)
 L1i cache:                       1.5 MiB (24 instances)
 L2 cache:                        12 MiB (24 instances)
 L3 cache:                        64 MiB (8 instances)
 NUMA node(s):                    4
 NUMA node0 CPU(s):               0-5,24-29
 NUMA node1 CPU(s):               12-17,36-41
 NUMA node2 CPU(s):               6-11,30-35
 NUMA node3 CPU(s):               18-23,42-47
 Vulnerability Itlb multihit:     Not affected
 Vulnerability L1tf:              Not affected
 Vulnerability Mds:               Not affected
 Vulnerability Meltdown:          Not affected
 Vulnerability Mmio stale data:   Not affected
 Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT vulnerable
 Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
 Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
 Vulnerability Srbds:             Not affected
 Vulnerability Tsx async abort:   Not affected
 
 ***CMake***
 
 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
 Copyright (C) 2021 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
 
 ***nvcc***
 
 ***Python***
 /home/andy/mambaforge/envs/rapids-23.06/bin/python
 Python 3.10.11
 
 ***Environment Variables***
 PATH                            : /home/andy/mambaforge/envs/rapids-23.06/bin:/usr/lib/jvm/java-8-openjdk-amd64/bin:/home/andy/gems/bin:/home/andy/mambaforge/condabin:/home/andy/.cargo/bin:/home/andy/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/andy/.local/share/JetBrains/Toolbox/scripts
 LD_LIBRARY_PATH                 :
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/andy/mambaforge/envs/rapids-23.06
 PYTHON_PATH                     :
 
 ***conda packages***
 /home/andy/mambaforge/condabin/conda
 # packages in environment at /home/andy/mambaforge/envs/rapids-23.06:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                       2_gnu    conda-forge
 arrow-cpp                 11.0.0          ha770c72_20_cpu    conda-forge
 aws-c-auth                0.6.27               he072965_1    conda-forge
 aws-c-cal                 0.5.26               hf677bf3_1    conda-forge
 aws-c-common              0.8.19               hd590300_0    conda-forge
 aws-c-compression         0.2.16               hbad4bc6_7    conda-forge
 aws-c-event-stream        0.2.20               hb4b372c_7    conda-forge
 aws-c-http                0.7.7                h2632f9a_4    conda-forge
 aws-c-io                  0.13.21              h9fef7b8_5    conda-forge
 aws-c-mqtt                0.8.11               h2282364_1    conda-forge
 aws-c-s3                  0.3.0                hcb5a9b2_2    conda-forge
 aws-c-sdkutils            0.1.9                hbad4bc6_2    conda-forge
 aws-checksums             0.1.14               hbad4bc6_7    conda-forge
 aws-crt-cpp               0.20.1               he0fdcb3_3    conda-forge
 aws-sdk-cpp               1.10.57             hb0b1f3a_12    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.19.0               hd590300_0    conda-forge
 ca-certificates           2023.5.7             hbcca054_0    conda-forge
 cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
 cubinlinker               0.2.0           py310hf09951c_1    rapidsai-nightly
 cuda-python               11.8.1          py310h01a121a_2    conda-forge
 cudatoolkit               11.8.0              h37601d7_11    conda-forge
 cudf                      23.06.00a       cuda11_py310_230519_gf1e88635c8_217    rapidsai-nightly
 cupy                      12.0.0          py310h9216885_1    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 fastavro                  1.7.4           py310h2372a71_0    conda-forge
 fastrlock                 0.8             py310hd8f1fbe_3    conda-forge
 fmt                       9.1.0                h924138e_0    conda-forge
 fsspec                    2023.5.0           pyh1a96a4e_0    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 glog                      0.6.0                h6f12383_0    conda-forge
 gmock                     1.13.0               ha770c72_1    conda-forge
 gtest                     1.13.0               h00ab1b0_1    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 krb5                      1.20.1               h81ceb04_0    conda-forge
 ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
 libabseil                 20230125.2      cxx17_h59595ed_2    conda-forge
 libarrow                  11.0.0          h6564b11_20_cpu    conda-forge
 libblas                   3.9.0           16_linux64_openblas    conda-forge
 libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
 libbrotlidec              1.0.9                h166bdaf_8    conda-forge
 libbrotlienc              1.0.9                h166bdaf_8    conda-forge
 libcblas                  3.9.0           16_linux64_openblas    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcudf                   23.06.00a       cuda11_230519_gf1e88635c8_217    rapidsai-nightly
 libcufile                 1.4.0.31                      0    nvidia
 libcufile-dev             1.4.0.31                      0    nvidia
 libcurl                   8.1.0                h409715c_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.12               h3358134_0    conda-forge
 libffi                    3.4.2                h7f98852_5    conda-forge
 libgcc-ng                 12.2.0              h65d4601_19    conda-forge
 libgfortran-ng            12.2.0              h69a702a_19    conda-forge
 libgfortran5              12.2.0              h337968e_19    conda-forge
 libgomp                   12.2.0              h65d4601_19    conda-forge
 libgoogle-cloud           2.10.1               hac9eb74_1    conda-forge
 libgrpc                   1.54.2               hb20ce57_2    conda-forge
 libkvikio                 23.06.00a       cuda11_230512_ga771e1c_25    rapidsai-nightly
 liblapack                 3.9.0           16_linux64_openblas    conda-forge
 libllvm11                 11.1.0               he0ac6c6_5    conda-forge
 libnghttp2                1.52.0               h61bc06f_0    conda-forge
 libnsl                    2.0.0                h7f98852_0    conda-forge
 libnuma                   2.0.16               h0b41bf4_1    conda-forge
 libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
 libprotobuf               3.21.12              h3eb15da_0    conda-forge
 librmm                    23.06.00a       cuda11_230519_gc11ea8a5_19    rapidsai-nightly
 libsqlite                 3.42.0               h2797004_0    conda-forge
 libssh2                   1.10.0               hf14f497_3    conda-forge
 libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
 libthrift                 0.18.1               h8fd135c_1    conda-forge
 libutf8proc               2.8.0                h166bdaf_0    conda-forge
 libuuid                   2.38.1               h0b41bf4_0    conda-forge
 libzlib                   1.2.13               h166bdaf_4    conda-forge
 llvmlite                  0.39.1          py310h58363a5_1    conda-forge
 lz4-c                     1.9.4                hcb278e6_0    conda-forge
 ncurses                   6.3                  h27087fc_1    conda-forge
 numba                     0.56.4          py310h0e39c9b_1    conda-forge
 numpy                     1.23.5          py310h53a5b5f_0    conda-forge
 nvtx                      0.2.5           py310h1fa729e_0    conda-forge
 openssl                   3.1.0                hd590300_3    conda-forge
 orc                       1.8.3                hfdbbad2_0    conda-forge
 packaging                 23.1               pyhd8ed1ab_0    conda-forge
 pandas                    1.5.3           py310h9b08913_1    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 pip                       23.1.2             pyhd8ed1ab_0    conda-forge
 protobuf                  4.21.12         py310heca2aa9_0    conda-forge
 ptxcompiler               0.8.0           py310h01a121a_0    conda-forge
 pyarrow                   11.0.0          py310he6bfd7f_20_cpu    conda-forge
 python                    3.10.11         he550d4f_0_cpython    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python_abi                3.10                    3_cp310    conda-forge
 pytz                      2023.3             pyhd8ed1ab_0    conda-forge
 re2                       2023.03.02           h8c504da_0    conda-forge
 readline                  8.2                  h8228510_1    conda-forge
 rmm                       23.06.00a       cuda11_py310_230519_gc11ea8a5_19    rapidsai-nightly
 s2n                       1.3.44               h06160fa_0    conda-forge
 setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.1.10               h9fff704_0    conda-forge
 spdlog                    1.11.0               h9b3ece8_1    conda-forge
 tk                        8.6.12               h27826a3_0    conda-forge
 typing_extensions         4.5.0              pyha770c72_0    conda-forge
 tzdata                    2023c                h71feb2d_0    conda-forge
 ucx                       1.14.0               h8c404fb_2    conda-forge
 wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
 xz                        5.2.6                h166bdaf_0    conda-forge
 zlib                      1.2.13               h166bdaf_4    conda-forge
 zstd                      1.5.2                h3eb15da_6    conda-forge

Additional context
Plugin tracking issue: NVIDIA/spark-rapids#8323

@andygrove andygrove added bug Something isn't working Needs Triage Need team to review and classify labels May 22, 2023
@davidwendt davidwendt self-assigned this May 22, 2023
rapids-bot bot pushed a commit that referenced this issue May 25, 2023
…13418)

Fixes bug where the `cudf::strings::replace_with_backrefs` goes into an infinite loop when an match results in an empty string. After each replace occurs, the logic continues to search for matches on the remainder of the string. Each new starting point must account for the previous match being empty.
Also included a gtest for this case.

Closes #13404

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Gregory Kimball (https://github.com/GregoryKimball)
  - Nghia Truong (https://github.com/ttnghia)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #13418
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants