Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Raise appropriate strings error when concatenating strings column #8228

Closed
VibhuJawa opened this issue May 12, 2021 · 0 comments · Fixed by #8290
Closed

[BUG] Raise appropriate strings error when concatenating strings column #8228

VibhuJawa opened this issue May 12, 2021 · 0 comments · Fixed by #8290
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented May 12, 2021

Describe the bug
We raise Total number of concatenated rows exceeds size_type range instead of total size of output strings is too large for a cudf column when concatenating two large string columns.

We should catch this on the python layer to better inform the user about the error details than just throwing the generic row error.

Steps/Code to reproduce bug

import cudf

num_strings = 1_000_000
string_scale_f = 100

s_1 = cudf.Series(['very long string '* string_scale_f]*num_strings)
s_2 = cudf.Series(['very long string '* string_scale_f]*num_strings)

s_3 = cudf.concat([s_1,s_2])

Expected behavior

I expect that concatenating string columns will raise the appropriate error.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [conda,]

Environment details

cudf                      0.20.0a210504   cuda_11.2_py38_gd56428abfc_260    rapidsai-nightly
dask-cudf                 0.20.0a210504   py38_gd56428abfc_260    rapidsai-nightly
libcudf                   0.20.0a210504   cuda11.2_gd56428abfc_260    rapidsai-nightly
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-05d6937199ad> in <module>
      7 s_2 = cudf.Series(['very long string '* string_scale_f]*num_strings)
      8 
----> 9 s_3 = cudf.concat([s_1,s_2])

/nvme/0/vjawa/conda/envs/rapids-0.20/lib/python3.8/site-packages/cudf/core/reshape.py in concat(objs, axis, join, ignore_index, sort)
    381             return objs[0]
    382         else:
--> 383             return cudf.Series._concat(
    384                 objs, axis=axis, index=None if ignore_index else True
    385             )

/nvme/0/vjawa/conda/envs/rapids-0.20/lib/python3.8/site-packages/cudf/core/series.py in _concat(cls, objs, axis, index)
   2592                     objs = numeric_normalize_types(*objs)
   2593 
-> 2594         col = ColumnBase._concat([o._column for o in objs])
   2595 
   2596         if isinstance(col, cudf.core.column.DecimalColumn):

/nvme/0/vjawa/conda/envs/rapids-0.20/lib/python3.8/site-packages/cudf/core/column/column.py in _concat(cls, objs, dtype)
    283         # Perform the actual concatenation
    284         if newsize > 0:
--> 285             col = libcudf.concat.concat_columns(objs)
    286         else:
    287             col = column_empty(0, head.dtype, masked=True)

cudf/_lib/concat.pyx in cudf._lib.concat.concat_columns()

cudf/_lib/concat.pyx in cudf._lib.concat.concat_columns()

RuntimeError: cuDF failure at: ../src/copying/concatenate.cu:365: Total number of concatenated rows exceeds size_type range
@VibhuJawa VibhuJawa added bug Something isn't working Needs Triage Need team to review and classify labels May 12, 2021
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels May 14, 2021
@skirui-source skirui-source self-assigned this May 18, 2021
@rapids-bot rapids-bot bot closed this as completed in #8290 Jun 1, 2021
rapids-bot bot pushed a commit that referenced this issue Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants