We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug replace_with_backrefs hangs in some cases.
replace_with_backrefs
Steps/Code to reproduce bug
>>> import cudf >>> cudf.__version__ '23.06.00' >>> s = cudf.Series(["one\ntwo", "three\n\n"]) >>> s.str.replace_with_backrefs('[^\n\r]*(\r|\r\n)?$', r'scala\1')
Expected behavior I would expect this to either fail with an error or complete without hanging.
Environment overview (please complete the following information)
Environment details
**git*** commit f1e88635c81ecb553957e89fcff83b26b5ff168e (HEAD -> regexp-hang, rapidsai/branch-23.06) Author: Lawrence Mitchell <[email protected]> Date: Fri May 19 15:25:54 2023 +0100 Correctly reorder and reindex scan groupbys with null keys (#13389) Scan-based groupbys are massaged back into pandas (original dataframe) order by a post-processing step. Previously, this did the wrong thing if the grouping key contained null (or nan) keys. In this situation dropna=True will cause libcudf to produce an output table that is smaller than the input frame. To mimic pandas we need to expand this output to the original frame size, inserting nulls in the missing rows and reordering correctly. Furthermore, the previous reordering code had an out-of-bounds memory access when there were null keys, since we were asking to group a column of the same length as the result, but the grouping object expects columns of length of the original input (which is larger with dropna=True and null keys). To fix these issues, compute the reordering on a column of appropriate size, and, if dropna is true and any of the key columns have nulls, go down a more expensive reordering path that inserts nulls correctly by reindexing the result. - Closes #13349 - Closes #12055 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: https://github.com/rapidsai/cudf/pull/13389 **git submodules*** ***OS Information*** DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS" PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy Linux ripper 5.19.0-41-generic #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 x86_64 x86_64 x86_64 GNU/Linux ***GPU Information*** Mon May 22 08:40:23 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3080 On | 00000000:42:00.0 Off | N/A | | 39% 68C P2 132W / 320W| 2356MiB / 10240MiB | 100% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 452399 G /usr/lib/xorg/Xorg 418MiB | | 0 N/A N/A 452569 G /usr/bin/gnome-shell 156MiB | | 0 N/A N/A 454700 G ./jetbrains-toolbox 12MiB | | 0 N/A N/A 469789 G ...irefox/2667/usr/lib/firefox/firefox 448MiB | | 0 N/A N/A 675603 C python3 620MiB | | 0 N/A N/A 675900 G ...,WinRetrieveSuggestionsOnlyOnDemand 76MiB | | 0 N/A N/A 676964 C python3 618MiB | +---------------------------------------------------------------------------------------+ ***CPU*** Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: AuthenticAMD Model name: AMD Ryzen Threadripper 2970WX 24-Core Processor CPU family: 23 Model: 8 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU max MHz: 3000.0000 CPU min MHz: 2200.0000 BogoMIPS: 5988.22 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es Virtualization: AMD-V L1d cache: 768 KiB (24 instances) L1i cache: 1.5 MiB (24 instances) L2 cache: 12 MiB (24 instances) L3 cache: 64 MiB (8 instances) NUMA node(s): 4 NUMA node0 CPU(s): 0-5,24-29 NUMA node1 CPU(s): 12-17,36-41 NUMA node2 CPU(s): 6-11,30-35 NUMA node3 CPU(s): 18-23,42-47 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT vulnerable Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected ***CMake*** ***g++*** /usr/bin/g++ g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ***nvcc*** ***Python*** /home/andy/mambaforge/envs/rapids-23.06/bin/python Python 3.10.11 ***Environment Variables*** PATH : /home/andy/mambaforge/envs/rapids-23.06/bin:/usr/lib/jvm/java-8-openjdk-amd64/bin:/home/andy/gems/bin:/home/andy/mambaforge/condabin:/home/andy/.cargo/bin:/home/andy/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/andy/.local/share/JetBrains/Toolbox/scripts LD_LIBRARY_PATH : NUMBAPRO_NVVM : NUMBAPRO_LIBDEVICE : CONDA_PREFIX : /home/andy/mambaforge/envs/rapids-23.06 PYTHON_PATH : ***conda packages*** /home/andy/mambaforge/condabin/conda # packages in environment at /home/andy/mambaforge/envs/rapids-23.06: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge arrow-cpp 11.0.0 ha770c72_20_cpu conda-forge aws-c-auth 0.6.27 he072965_1 conda-forge aws-c-cal 0.5.26 hf677bf3_1 conda-forge aws-c-common 0.8.19 hd590300_0 conda-forge aws-c-compression 0.2.16 hbad4bc6_7 conda-forge aws-c-event-stream 0.2.20 hb4b372c_7 conda-forge aws-c-http 0.7.7 h2632f9a_4 conda-forge aws-c-io 0.13.21 h9fef7b8_5 conda-forge aws-c-mqtt 0.8.11 h2282364_1 conda-forge aws-c-s3 0.3.0 hcb5a9b2_2 conda-forge aws-c-sdkutils 0.1.9 hbad4bc6_2 conda-forge aws-checksums 0.1.14 hbad4bc6_7 conda-forge aws-crt-cpp 0.20.1 he0fdcb3_3 conda-forge aws-sdk-cpp 1.10.57 hb0b1f3a_12 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.19.0 hd590300_0 conda-forge ca-certificates 2023.5.7 hbcca054_0 conda-forge cachetools 5.3.0 pyhd8ed1ab_0 conda-forge cubinlinker 0.2.0 py310hf09951c_1 rapidsai-nightly cuda-python 11.8.1 py310h01a121a_2 conda-forge cudatoolkit 11.8.0 h37601d7_11 conda-forge cudf 23.06.00a cuda11_py310_230519_gf1e88635c8_217 rapidsai-nightly cupy 12.0.0 py310h9216885_1 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge fastavro 1.7.4 py310h2372a71_0 conda-forge fastrlock 0.8 py310hd8f1fbe_3 conda-forge fmt 9.1.0 h924138e_0 conda-forge fsspec 2023.5.0 pyh1a96a4e_0 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge glog 0.6.0 h6f12383_0 conda-forge gmock 1.13.0 ha770c72_1 conda-forge gtest 1.13.0 h00ab1b0_1 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge krb5 1.20.1 h81ceb04_0 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge libabseil 20230125.2 cxx17_h59595ed_2 conda-forge libarrow 11.0.0 h6564b11_20_cpu conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_8 conda-forge libbrotlidec 1.0.9 h166bdaf_8 conda-forge libbrotlienc 1.0.9 h166bdaf_8 conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcudf 23.06.00a cuda11_230519_gf1e88635c8_217 rapidsai-nightly libcufile 1.4.0.31 0 nvidia libcufile-dev 1.4.0.31 0 nvidia libcurl 8.1.0 h409715c_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.12 h3358134_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libgoogle-cloud 2.10.1 hac9eb74_1 conda-forge libgrpc 1.54.2 hb20ce57_2 conda-forge libkvikio 23.06.00a cuda11_230512_ga771e1c_25 rapidsai-nightly liblapack 3.9.0 16_linux64_openblas conda-forge libllvm11 11.1.0 he0ac6c6_5 conda-forge libnghttp2 1.52.0 h61bc06f_0 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libnuma 2.0.16 h0b41bf4_1 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libprotobuf 3.21.12 h3eb15da_0 conda-forge librmm 23.06.00a cuda11_230519_gc11ea8a5_19 rapidsai-nightly libsqlite 3.42.0 h2797004_0 conda-forge libssh2 1.10.0 hf14f497_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libthrift 0.18.1 h8fd135c_1 conda-forge libutf8proc 2.8.0 h166bdaf_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge llvmlite 0.39.1 py310h58363a5_1 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge ncurses 6.3 h27087fc_1 conda-forge numba 0.56.4 py310h0e39c9b_1 conda-forge numpy 1.23.5 py310h53a5b5f_0 conda-forge nvtx 0.2.5 py310h1fa729e_0 conda-forge openssl 3.1.0 hd590300_3 conda-forge orc 1.8.3 hfdbbad2_0 conda-forge packaging 23.1 pyhd8ed1ab_0 conda-forge pandas 1.5.3 py310h9b08913_1 conda-forge parquet-cpp 1.5.1 2 conda-forge pip 23.1.2 pyhd8ed1ab_0 conda-forge protobuf 4.21.12 py310heca2aa9_0 conda-forge ptxcompiler 0.8.0 py310h01a121a_0 conda-forge pyarrow 11.0.0 py310he6bfd7f_20_cpu conda-forge python 3.10.11 he550d4f_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.10 3_cp310 conda-forge pytz 2023.3 pyhd8ed1ab_0 conda-forge re2 2023.03.02 h8c504da_0 conda-forge readline 8.2 h8228510_1 conda-forge rmm 23.06.00a cuda11_py310_230519_gc11ea8a5_19 rapidsai-nightly s2n 1.3.44 h06160fa_0 conda-forge setuptools 67.7.2 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.10 h9fff704_0 conda-forge spdlog 1.11.0 h9b3ece8_1 conda-forge tk 8.6.12 h27826a3_0 conda-forge typing_extensions 4.5.0 pyha770c72_0 conda-forge tzdata 2023c h71feb2d_0 conda-forge ucx 1.14.0 h8c404fb_2 conda-forge wheel 0.40.0 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h3eb15da_6 conda-forge
Additional context Plugin tracking issue: NVIDIA/spark-rapids#8323
The text was updated successfully, but these errors were encountered:
Fix cudf::strings::replace_with_backrefs hang on empty match result (#…
960cc42
…13418) Fixes bug where the `cudf::strings::replace_with_backrefs` goes into an infinite loop when an match results in an empty string. After each replace occurs, the logic continues to search for matches on the remainder of the string. Each new starting point must account for the previous match being empty. Also included a gtest for this case. Closes #13404 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Gregory Kimball (https://github.com/GregoryKimball) - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #13418
davidwendt
Successfully merging a pull request may close this issue.
Describe the bug
replace_with_backrefs
hangs in some cases.Steps/Code to reproduce bug
Expected behavior
I would expect this to either fail with an error or complete without hanging.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
Plugin tracking issue: NVIDIA/spark-rapids#8323
The text was updated successfully, but these errors were encountered: