Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf::repeat crashes with SIGFPE when count == 0 #13458

Closed
jlowe opened this issue May 26, 2023 · 1 comment · Fixed by #13459
Closed

[BUG] cudf::repeat crashes with SIGFPE when count == 0 #13458

jlowe opened this issue May 26, 2023 · 1 comment · Fixed by #13459
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented May 26, 2023

Describe the bug
After #13323 calling cudf::repeat with a scalar count of 0 will crash the process with SIGFPE due to a divide by zero. The problem is an assertion was added that involves dividing by count, but count can be zero at that point which will result in a divide by zero.

Steps/Code to reproduce bug
Call cudf::repeat with a scalar count of zero.

Expected behavior
The process should not crash and instead an empty table should be returned when count == 0.

@jlowe jlowe added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS labels May 26, 2023
@jlowe
Copy link
Member Author

jlowe commented May 26, 2023

Looks like subword_tokenize can also do this if max_rows_tensor == 0 which is probably nonsensical but still nice to avoid a nasty SIGFPE. Side note, doesn't look like the max_rows_tensor value is otherwise used? It's passed to the wordpiece_tokenizer constructor which seems to just ignore it.

@davidwendt davidwendt self-assigned this May 26, 2023
rapids-bot bot pushed a commit that referenced this issue Jun 1, 2023
Removes the `max_rows_tensor` parameter is from the `nvtext::subword_tokenize` API since it is no longer required. The parameter was intended to size the temporary working memory for the internal functions. Since some general rework it was no longer used but never removed from the API.
Also updates the Python/Cython calls which had been hard-coding a default value anyway.

Reference issue #13458 found this issue.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Divye Gala (https://github.com/divyegala)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #13463
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants