[BUG] `str.character_ngrams` produces <NA> with strings < ngram length #14684

Vortexx2 · 2023-12-29T10:26:09Z

Describe the bug
The str.character_ngrams function produces token <NA> for strings which are lesser than the provided n (shown in image for the case of bigrams).

I have debugged this and as far as I understand it, it is being caused by an empty list returned by the libstrings.generate_character_ngrams function. This causes to be a part of the result when it is exploded in the problematic function.
This issue causes several bugs in downstream tasks (like when using cuml for CountVectorizer etc).

Steps/Code to reproduce bug
Minimum code required to reproduce the bug:

import cudf
str_series = cudf.Series(['1744', '4'])
str_series.str.character_ngrams(2)

Expected behavior
should not be a part of the output. This causes several downstream tasks to fail because is not a valid token in the actual input string series.

Environment overview (please complete the following information)

Environment location: Cloud GCP
Method of cuDF install: pip

Environment details

**git***
     Not inside a git repository
     
     ***OS Information***
     PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
     NAME="Debian GNU/Linux"
     VERSION_ID="11"
     VERSION="11 (bullseye)"
     VERSION_CODENAME=bullseye
     ID=debian
     HOME_URL="https://www.debian.org/"
     SUPPORT_URL="https://www.debian.org/support"
     BUG_REPORT_URL="https://bugs.debian.org/"
     Linux janmey-gpu-c2 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux
     
     ***GPU Information***
     Fri Dec 29 10:21:54 2023
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                               |                      |               MIG M. |
     |===============================+======================+======================|
     |   0  Tesla T4            On   | 00000000:00:04.0 Off |                    0 |
     | N/A   70C    P0    33W /  70W |    459MiB / 15360MiB |      0%      Default |
     |                               |                      |                  N/A |
     +-------------------------------+----------------------+----------------------+
     
     +-----------------------------------------------------------------------------+
     | Processes:                                                                  |
     |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
     |        ID   ID                                                   Usage      |
     |=============================================================================|
     |    0   N/A  N/A    316341      C   ..._log_ner/.venv/bin/python      454MiB |
     +-----------------------------------------------------------------------------+
     
     ***CPU***
     Architecture:                       x86_64
     CPU op-mode(s):                     32-bit, 64-bit
     Byte Order:                         Little Endian
     Address sizes:                      46 bits physical, 48 bits virtual
     CPU(s):                             16
     On-line CPU(s) list:                0-15
     Thread(s) per core:                 2
     Core(s) per socket:                 8
     Socket(s):                          1
     NUMA node(s):                       1
     Vendor ID:                          GenuineIntel
     CPU family:                         6
     Model:                              79
     Model name:                         Intel(R) Xeon(R) CPU @ 2.20GHz
     Stepping:                           0
     CPU MHz:                            2199.998
     BogoMIPS:                           4399.99
     Hypervisor vendor:                  KVM
     Virtualization type:                full
     L1d cache:                          256 KiB
     L1i cache:                          256 KiB
     L2 cache:                           2 MiB
     L3 cache:                           55 MiB
     NUMA node0 CPU(s):                  0-15
     Vulnerability Gather data sampling: Not affected
     Vulnerability Itlb multihit:        Not affected
     Vulnerability L1tf:                 Mitigation; PTE Inversion
     Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT Host state unknown
     Vulnerability Meltdown:             Mitigation; PTI
     Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
     Vulnerability Retbleed:             Mitigation; IBRS
     Vulnerability Spec rstack overflow: Not affected
     Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
     Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
     Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
     Vulnerability Srbds:                Not affected
     Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT Host state unknown
     Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
     
     ***CMake***
     /usr/bin/cmake
     cmake version 3.18.4
     
     CMake suite maintained and supported by Kitware (kitware.com/cmake).
     
     ***g++***
     /usr/bin/g++
     g++ (Debian 10.2.1-6) 10.2.1 20210110
     Copyright (C) 2020 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
     
     
     ***nvcc***
     /usr/local/cuda/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2022 NVIDIA Corporation
     Built on Wed_Sep_21_10:33:58_PDT_2022
     Cuda compilation tools, release 11.8, V11.8.89
     Build cuda_11.8.r11.8/compiler.31833905_0
     
     ***Python***
     /home/janmeysandeepshukla/datasci/transaction_log_ner/.venv/bin/python
     Python 3.10.13
     
     ***Environment Variables***
     PATH                            : /home/janmeysandeepshukla/datasci/transaction_log_ner/.venv/bin:/home/janmeysandeepshukla/.vscode-server/bin/0ee08df0cf4527e40edc9aa28f4b5bd38bbff2b2/bin/remote-cli:/usr/local/cuda/bin:/opt/conda/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
     LD_LIBRARY_PATH                 : /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /opt/conda
     PYTHON_PATH                     :
     
     ***conda packages***
     /opt/conda/bin/conda
     # packages in environment at /opt/conda:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                 conda_forge    conda-forge
     _openmp_mutex             4.5                       2_gnu    conda-forge
     absl-py                   2.0.0                    pypi_0    pypi
     aiofiles                  22.1.0                   pypi_0    pypi
     aiohttp                   3.9.1                    pypi_0    pypi
     aiohttp-cors              0.7.0                    pypi_0    pypi
     aiorwlock                 1.3.0                    pypi_0    pypi
     aiosignal                 1.3.1                    pypi_0    pypi
     aiosqlite                 0.19.0                   pypi_0    pypi
     anyio                     3.7.1                    pypi_0    pypi
     archspec                  0.2.2              pyhd8ed1ab_0    conda-forge
     argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
     argon2-cffi-bindings      21.2.0          py310h2372a71_4    conda-forge
     arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
     asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
     async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
     async-timeout             4.0.3                    pypi_0    pypi
     attrs                     23.1.0             pyh71513ae_1    conda-forge
     babel                     2.13.1             pyhd8ed1ab_0    conda-forge
     backoff                   2.2.1                    pypi_0    pypi
     beatrix-jupyterlab        2023.128.151533          pypi_0    pypi
     beautifulsoup4            4.12.2             pyha770c72_0    conda-forge
     bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
     blessed                   1.20.0                   pypi_0    pypi
     boltons                   23.0.0             pyhd8ed1ab_0    conda-forge
     brotli-python             1.1.0           py310hc6cd4ac_1    conda-forge
     bzip2                     1.0.8                hd590300_5    conda-forge
     c-ares                    1.23.0               hd590300_0    conda-forge
     ca-certificates           2023.11.17           hbcca054_0    conda-forge
     cached-property           1.5.2                hd8ed1ab_1    conda-forge
     cached_property           1.5.2              pyha770c72_1    conda-forge
     cachetools                5.3.2                    pypi_0    pypi
     certifi                   2023.11.17         pyhd8ed1ab_0    conda-forge
     cffi                      1.16.0          py310h2fee648_0    conda-forge
     charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
     click                     8.1.7                    pypi_0    pypi
     cloud-tpu-client          0.10                     pypi_0    pypi
     cloudpickle               3.0.0                    pypi_0    pypi
     colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
     colorful                  0.5.5                    pypi_0    pypi
     comm                      0.2.0                    pypi_0    pypi
     conda                     23.11.0         py310hff52083_1    conda-forge
     conda-libmamba-solver     23.11.1            pyhd8ed1ab_0    conda-forge
     conda-package-handling    2.2.0              pyh38be061_0    conda-forge
     conda-package-streaming   0.9.0              pyhd8ed1ab_0    conda-forge
     contourpy                 1.2.0                    pypi_0    pypi
     cryptography              41.0.7                   pypi_0    pypi
     cycler                    0.12.1                   pypi_0    pypi
     cython                    3.0.6                    pypi_0    pypi
     dacite                    1.8.1                    pypi_0    pypi
     dataproc-jupyter-plugin   0.1.59                   pypi_0    pypi
     db-dtypes                 1.1.1                    pypi_0    pypi
     debugpy                   1.8.0           py310hc6cd4ac_1    conda-forge
     decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
     defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
     deprecated                1.2.14                   pypi_0    pypi
     distlib                   0.3.7                    pypi_0    pypi
     distro                    1.8.0              pyhd8ed1ab_0    conda-forge
     dlenv-base                1.0.20231210            py310_0    file:///tmp/conda-pkgs
     dm-tree                   0.1.8                    pypi_0    pypi
     docker                    7.0.0                    pypi_0    pypi
     docstring-parser          0.15                     pypi_0    pypi
     entrypoints               0.4                pyhd8ed1ab_0    conda-forge
     exceptiongroup            1.2.0              pyhd8ed1ab_0    conda-forge
     executing                 2.0.1              pyhd8ed1ab_0    conda-forge
     farama-notifications      0.0.4                    pypi_0    pypi
     fastapi                   0.104.1                  pypi_0    pypi
     filelock                  3.13.1                   pypi_0    pypi
     fmt                       10.1.1               h00ab1b0_1    conda-forge
     fonttools                 4.46.0                   pypi_0    pypi
     fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
     frozenlist                1.4.0                    pypi_0    pypi
     fsspec                    2023.12.1                pypi_0    pypi
     gcsfs                     2023.12.1                pypi_0    pypi
     gitdb                     4.0.11                   pypi_0    pypi
     gitpython                 3.1.40                   pypi_0    pypi
     google-api-core           1.34.0                   pypi_0    pypi
     google-api-python-client  1.8.0                    pypi_0    pypi
     google-auth               2.25.2                   pypi_0    pypi
     google-auth-httplib2      0.1.1                    pypi_0    pypi
     google-auth-oauthlib      1.1.0                    pypi_0    pypi
     google-cloud-aiplatform   1.37.0                   pypi_0    pypi
     google-cloud-artifact-registry 1.10.0                   pypi_0    pypi
     google-cloud-bigquery     3.13.0                   pypi_0    pypi
     google-cloud-bigquery-storage 2.23.0                   pypi_0    pypi
     google-cloud-core         2.4.1                    pypi_0    pypi
     google-cloud-datastore    1.15.5                   pypi_0    pypi
     google-cloud-jupyter-config 0.0.5                    pypi_0    pypi
     google-cloud-language     2.12.0                   pypi_0    pypi
     google-cloud-monitoring   2.17.0                   pypi_0    pypi
     google-cloud-resource-manager 1.11.0                   pypi_0    pypi
     google-cloud-storage      2.13.0                   pypi_0    pypi
     google-crc32c             1.5.0                    pypi_0    pypi
     google-resumable-media    2.6.0                    pypi_0    pypi
     googleapis-common-protos  1.62.0                   pypi_0    pypi
     gpustat                   1.0.0                    pypi_0    pypi
     greenlet                  3.0.2                    pypi_0    pypi
     grpc-google-iam-v1        0.13.0                   pypi_0    pypi
     grpcio                    1.60.0                   pypi_0    pypi
     grpcio-status             1.48.2                   pypi_0    pypi
     gymnasium                 0.28.1                   pypi_0    pypi
     h11                       0.14.0                   pypi_0    pypi
     htmlmin                   0.1.12                   pypi_0    pypi
     httplib2                  0.22.0                   pypi_0    pypi
     httptools                 0.6.1                    pypi_0    pypi
     icu                       73.2                 h59595ed_0    conda-forge
     idna                      3.6                pyhd8ed1ab_0    conda-forge
     imagehash                 4.3.1                    pypi_0    pypi
     imageio                   2.33.0                   pypi_0    pypi
     importlib-metadata        6.11.0                   pypi_0    pypi
     importlib_metadata        7.0.0                hd8ed1ab_0    conda-forge
     importlib_resources       6.1.1              pyhd8ed1ab_0    conda-forge
     ipykernel                 6.27.1                   pypi_0    pypi
     ipython                   8.18.1             pyh707e725_3    conda-forge
     ipython-genutils          0.2.0                    pypi_0    pypi
     ipython-sql               0.5.0                    pypi_0    pypi
     ipywidgets                8.1.1                    pypi_0    pypi
     isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
     jaraco-classes            3.3.0                    pypi_0    pypi
     jax-jumpy                 1.0.0                    pypi_0    pypi
     jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
     jeepney                   0.8.0                    pypi_0    pypi
     jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
     joblib                    1.3.2                    pypi_0    pypi
     json5                     0.9.14             pyhd8ed1ab_0    conda-forge
     jsonpatch                 1.33               pyhd8ed1ab_0    conda-forge
     jsonpointer               2.4             py310hff52083_3    conda-forge
     jsonschema                4.20.0             pyhd8ed1ab_0    conda-forge
     jsonschema-specifications 2023.11.2          pyhd8ed1ab_0    conda-forge
     jsonschema-with-format-nongpl 4.20.0             pyhd8ed1ab_0    conda-forge
     jupyter-client            7.4.9                    pypi_0    pypi
     jupyter-http-over-ws      0.0.8                    pypi_0    pypi
     jupyter-lsp               2.2.1              pyhd8ed1ab_0    conda-forge
     jupyter-server-fileid     0.9.0                    pypi_0    pypi
     jupyter-server-mathjax    0.2.6                    pypi_0    pypi
     jupyter-server-proxy      4.1.0                    pypi_0    pypi
     jupyter-server-ydoc       0.8.0                    pypi_0    pypi
     jupyter-ydoc              0.2.5                    pypi_0    pypi
     jupyter_client            8.6.0              pyhd8ed1ab_0    conda-forge
     jupyter_core              5.5.0           py310hff52083_0    conda-forge
     jupyter_events            0.9.0              pyhd8ed1ab_0    conda-forge
     jupyter_server            2.12.1             pyhd8ed1ab_0    conda-forge
     jupyter_server_terminals  0.4.4              pyhd8ed1ab_1    conda-forge
     jupyterlab                3.6.6                    pypi_0    pypi
     jupyterlab-git            0.44.0                   pypi_0    pypi
     jupyterlab-widgets        3.0.9                    pypi_0    pypi
     jupyterlab_pygments       0.3.0              pyhd8ed1ab_0    conda-forge
     jupyterlab_server         2.25.2             pyhd8ed1ab_0    conda-forge
     jupytext                  1.16.0                   pypi_0    pypi
     kernels-mixer             0.0.7                    pypi_0    pypi
     keyring                   24.3.0                   pypi_0    pypi
     keyrings-google-artifactregistry-auth 1.1.2                    pypi_0    pypi
     keyutils                  1.6.1                h166bdaf_0    conda-forge
     kfp                       2.4.0                    pypi_0    pypi
     kfp-pipeline-spec         0.2.2                    pypi_0    pypi
     kfp-server-api            2.0.5                    pypi_0    pypi
     kiwisolver                1.4.5                    pypi_0    pypi
     krb5                      1.21.2               h659d440_0    conda-forge
     kubernetes                26.1.0                   pypi_0    pypi
     lazy-loader               0.3                      pypi_0    pypi
     ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
     libarchive                3.7.2                h2aa1ff5_1    conda-forge
     libcurl                   8.5.0                hca28451_0    conda-forge
     libedit                   3.1.20191231         he28a2e2_2    conda-forge
     libev                     4.33                 hd590300_2    conda-forge
     libffi                    3.4.2                h7f98852_5    conda-forge
     libgcc-ng                 13.2.0               h807b86a_3    conda-forge
     libgomp                   13.2.0               h807b86a_3    conda-forge
     libiconv                  1.17                 h166bdaf_0    conda-forge
     libmamba                  1.5.4                had39da4_0    conda-forge
     libmambapy                1.5.4           py310h39ff949_0    conda-forge
     libnghttp2                1.58.0               h47da74e_1    conda-forge
     libnsl                    2.0.1                hd590300_0    conda-forge
     libsodium                 1.0.18               h36c2ea0_1    conda-forge
     libsolv                   0.7.27               hfc55251_0    conda-forge
     libsqlite                 3.44.2               h2797004_0    conda-forge
     libssh2                   1.11.0               h0841786_0    conda-forge
     libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
     libuuid                   2.38.1               h0b41bf4_0    conda-forge
     libuv                     1.46.0               hd590300_0    conda-forge
     libxml2                   2.12.2               h232c23b_0    conda-forge
     libzlib                   1.2.13               hd590300_5    conda-forge
     llvmlite                  0.41.1                   pypi_0    pypi
     lz4                       4.3.2                    pypi_0    pypi
     lz4-c                     1.9.4                hcb278e6_0    conda-forge
     lzo                       2.10              h516909a_1000    conda-forge
     markdown-it-py            3.0.0                    pypi_0    pypi
     markupsafe                2.1.3           py310h2372a71_1    conda-forge
     matplotlib                3.7.3                    pypi_0    pypi
     matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
     mdit-py-plugins           0.4.0                    pypi_0    pypi
     mdurl                     0.1.2                    pypi_0    pypi
     menuinst                  2.0.0           py310hff52083_1    conda-forge
     mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
     more-itertools            10.1.0                   pypi_0    pypi
     msgpack                   1.0.7                    pypi_0    pypi
     multidict                 6.0.4                    pypi_0    pypi
     multimethod               1.10                     pypi_0    pypi
     nb_conda                  2.2.1                    unix_6    conda-forge
     nb_conda_kernels          2.3.1              pyhd8ed1ab_3    conda-forge
     nbclassic                 1.0.0                    pypi_0    pypi
     nbclient                  0.9.0                    pypi_0    pypi
     nbconvert-core            7.12.0             pyhd8ed1ab_0    conda-forge
     nbdime                    3.2.0                    pypi_0    pypi
     nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
     ncurses                   6.4                  h59595ed_2    conda-forge
     nest-asyncio              1.5.8              pyhd8ed1ab_0    conda-forge
     networkx                  3.2.1                    pypi_0    pypi
     nodejs                    20.9.0               hb753e55_0    conda-forge
     notebook                  6.5.6                    pypi_0    pypi
     notebook-executor         0.2                      pypi_0    pypi
     notebook-shim             0.2.3              pyhd8ed1ab_0    conda-forge
     numba                     0.58.1                   pypi_0    pypi
     numpy                     1.25.2                   pypi_0    pypi
     nvidia-ml-py              11.495.46                pypi_0    pypi
     oauth2client              4.1.3                    pypi_0    pypi
     oauthlib                  3.2.2                    pypi_0    pypi
     opencensus                0.11.3                   pypi_0    pypi
     opencensus-context        0.1.3                    pypi_0    pypi
     openssl                   3.2.0                hd590300_1    conda-forge
     opentelemetry-api         1.21.0                   pypi_0    pypi
     opentelemetry-exporter-otlp 1.21.0                   pypi_0    pypi
     opentelemetry-exporter-otlp-proto-common 1.21.0                   pypi_0    pypi
     opentelemetry-exporter-otlp-proto-grpc 1.21.0                   pypi_0    pypi
     opentelemetry-exporter-otlp-proto-http 1.21.0                   pypi_0    pypi
     opentelemetry-proto       1.21.0                   pypi_0    pypi
     opentelemetry-sdk         1.21.0                   pypi_0    pypi
     opentelemetry-semantic-conventions 0.42b0                   pypi_0    pypi
     overrides                 7.4.0              pyhd8ed1ab_0    conda-forge
     packaging                 23.2               pyhd8ed1ab_0    conda-forge
     pandas                    2.0.3                    pypi_0    pypi
     pandas-profiling          3.6.6                    pypi_0    pypi
     pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
     papermill                 2.5.0                    pypi_0    pypi
     parso                     0.8.3              pyhd8ed1ab_0    conda-forge
     patsy                     0.5.4                    pypi_0    pypi
     pexpect                   4.9.0                    pypi_0    pypi
     phik                      0.12.3                   pypi_0    pypi
     pickleshare               0.7.5                   py_1003    conda-forge
     pillow                    10.1.0                   pypi_0    pypi
     pip                       23.3.1             pyhd8ed1ab_0    conda-forge
     pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
     platformdirs              3.11.0                   pypi_0    pypi
     plotly                    5.18.0                   pypi_0    pypi
     pluggy                    1.3.0              pyhd8ed1ab_0    conda-forge
     prettytable               3.9.0                    pypi_0    pypi
     prometheus_client         0.19.0             pyhd8ed1ab_0    conda-forge
     prompt-toolkit            3.0.41             pyha770c72_0    conda-forge
     proto-plus                1.23.0                   pypi_0    pypi
     protobuf                  3.20.3                   pypi_0    pypi
     psutil                    5.9.3                    pypi_0    pypi
     ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
     pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
     py-spy                    0.3.14                   pypi_0    pypi
     pyarrow                   14.0.1                   pypi_0    pypi
     pyasn1                    0.5.1                    pypi_0    pypi
     pyasn1-modules            0.3.0                    pypi_0    pypi
     pybind11-abi              4                    hd8ed1ab_3    conda-forge
     pycosat                   0.6.6           py310h2372a71_0    conda-forge
     pycparser                 2.21               pyhd8ed1ab_0    conda-forge
     pydantic                  1.10.13                  pypi_0    pypi
     pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
     pyjwt                     2.8.0                    pypi_0    pypi
     pyparsing                 3.1.1                    pypi_0    pypi
     pysocks                   1.7.1              pyha2e5f31_6    conda-forge
     python                    3.10.13         hd12c33a_0_cpython    conda-forge
     python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
     python-dotenv             1.0.0                    pypi_0    pypi
     python-fastjsonschema     2.19.0             pyhd8ed1ab_0    conda-forge
     python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
     python_abi                3.10                    4_cp310    conda-forge
     pytz                      2023.3.post1       pyhd8ed1ab_0    conda-forge
     pywavelets                1.5.0                    pypi_0    pypi
     pyyaml                    6.0.1           py310h2372a71_1    conda-forge
     pyzmq                     24.0.1                   pypi_0    pypi
     ray                       2.8.1                    pypi_0    pypi
     ray-cpp                   2.8.1                    pypi_0    pypi
     readline                  8.2                  h8228510_1    conda-forge
     referencing               0.32.0             pyhd8ed1ab_0    conda-forge
     reproc                    14.2.4.post0         hd590300_1    conda-forge
     reproc-cpp                14.2.4.post0         h59595ed_1    conda-forge
     requests                  2.31.0             pyhd8ed1ab_0    conda-forge
     requests-oauthlib         1.3.1                    pypi_0    pypi
     requests-toolbelt         0.10.1                   pypi_0    pypi
     retrying                  1.3.4                    pypi_0    pypi
     rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
     rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
     rich                      13.7.0                   pypi_0    pypi
     rpds-py                   0.13.2          py310hcb5633a_0    conda-forge
     ruamel.yaml               0.18.5          py310h2372a71_0    conda-forge
     ruamel.yaml.clib          0.2.7           py310h2372a71_2    conda-forge
     scikit-image              0.22.0                   pypi_0    pypi
     scikit-learn              1.3.2                    pypi_0    pypi
     scipy                     1.11.4                   pypi_0    pypi
     seaborn                   0.12.2                   pypi_0    pypi
     secretstorage             3.3.3                    pypi_0    pypi
     send2trash                1.8.2              pyh41d4057_0    conda-forge
     setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
     shapely                   2.0.2                    pypi_0    pypi
     simpervisor               1.0.0                    pypi_0    pypi
     six                       1.16.0             pyh6c4a22f_0    conda-forge
     smart-open                6.4.0                    pypi_0    pypi
     smmap                     5.0.1                    pypi_0    pypi
     sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
     soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
     sqlalchemy                2.0.23                   pypi_0    pypi
     sqlparse                  0.4.4                    pypi_0    pypi
     stack-data                0.6.3                    pypi_0    pypi
     stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
     starlette                 0.27.0                   pypi_0    pypi
     statsmodels               0.14.0                   pypi_0    pypi
     tabulate                  0.9.0                    pypi_0    pypi
     tangled-up-in-unicode     0.2.0                    pypi_0    pypi
     tenacity                  8.2.3                    pypi_0    pypi
     tensorboardx              2.6.2.2                  pypi_0    pypi
     terminado                 0.18.0             pyh0d859eb_0    conda-forge
     threadpoolctl             3.2.0                    pypi_0    pypi
     tifffile                  2023.12.9                pypi_0    pypi
     tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
     tk                        8.6.13          noxft_h4845f30_101    conda-forge
     toml                      0.10.2                   pypi_0    pypi
     tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
     tornado                   6.3.3           py310h2372a71_1    conda-forge
     tqdm                      4.66.1             pyhd8ed1ab_0    conda-forge
     traitlets                 5.14.0             pyhd8ed1ab_0    conda-forge
     truststore                0.8.0              pyhd8ed1ab_0    conda-forge
     typeguard                 4.1.5                    pypi_0    pypi
     typer                     0.9.0                    pypi_0    pypi
     types-python-dateutil     2.8.19.14          pyhd8ed1ab_0    conda-forge
     typing-extensions         4.8.0                hd8ed1ab_0    conda-forge
     typing_extensions         4.8.0              pyha770c72_0    conda-forge
     typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
     tzdata                    2023.3                   pypi_0    pypi
     uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
     uritemplate               3.0.1                    pypi_0    pypi
     urllib3                   1.26.18                  pypi_0    pypi
     uvicorn                   0.24.0.post1             pypi_0    pypi
     uvloop                    0.19.0                   pypi_0    pypi
     virtualenv                20.21.0                  pypi_0    pypi
     visions                   0.7.5                    pypi_0    pypi
     watchfiles                0.21.0                   pypi_0    pypi
     wcwidth                   0.2.12             pyhd8ed1ab_0    conda-forge
     webcolors                 1.13               pyhd8ed1ab_0    conda-forge
     webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
     websocket-client          1.7.0              pyhd8ed1ab_0    conda-forge
     websockets                12.0                     pypi_0    pypi
     wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
     widgetsnbextension        4.0.9                    pypi_0    pypi
     wordcloud                 1.9.3                    pypi_0    pypi
     wrapt                     1.16.0                   pypi_0    pypi
     xz                        5.2.6                h166bdaf_0    conda-forge
     y-py                      0.6.2                    pypi_0    pypi
     yaml                      0.2.5                h7f98852_2    conda-forge
     yaml-cpp                  0.8.0                h59595ed_0    conda-forge
     yarl                      1.9.4                    pypi_0    pypi
     ydata-profiling           4.6.0                    pypi_0    pypi
     ypy-websocket             0.8.4                    pypi_0    pypi
     zeromq                    4.3.5                h59595ed_0    conda-forge
     zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
     zlib                      1.2.13               hd590300_5    conda-forge
     zstandard                 0.22.0          py310h1275a96_0    conda-forge
     zstd                      1.5.5                hfc55251_0    conda-forge
     ```

The text was updated successfully, but these errors were encountered:

Vortexx2 · 2023-12-29T10:40:13Z

This PR should fix the above issue.

vyasr · 2024-01-11T03:17:37Z

Thanks for the report and the fix!

davidwendt · 2024-12-13T19:58:25Z

Closed by #15371

Vortexx2 added Needs Triage Need team to review and classify bug Something isn't working labels Dec 29, 2023

Vortexx2 mentioned this issue Dec 29, 2023

Fix issue with below limit strings in ngram calculation #14685

Closed

Vortexx2 mentioned this issue Dec 29, 2023

[BUG] CountVectorizer Vocabulary Length Mismatch rapidsai/cuml#5709

Closed

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

davidwendt closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `str.character_ngrams` produces <NA> with strings < ngram length #14684

[BUG] `str.character_ngrams` produces <NA> with strings < ngram length #14684

Vortexx2 commented Dec 29, 2023

Vortexx2 commented Dec 29, 2023

vyasr commented Jan 11, 2024

davidwendt commented Dec 13, 2024

[BUG] str.character_ngrams produces <NA> with strings < ngram length #14684

[BUG] str.character_ngrams produces <NA> with strings < ngram length #14684

Comments

Vortexx2 commented Dec 29, 2023

Vortexx2 commented Dec 29, 2023

vyasr commented Jan 11, 2024

davidwendt commented Dec 13, 2024

[BUG] `str.character_ngrams` produces <NA> with strings < ngram length #14684

[BUG] `str.character_ngrams` produces <NA> with strings < ngram length #14684