Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Throw std::overflow exception when an output column size exceeds cudf limit #12925

Closed
ttnghia opened this issue Mar 10, 2023 · 4 comments · Fixed by #13323
Closed

[FEA] Throw std::overflow exception when an output column size exceeds cudf limit #12925

ttnghia opened this issue Mar 10, 2023 · 4 comments · Fixed by #13323
Assignees
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Mar 10, 2023

In Spark, we want to handle the cases when the libcudf APIs throw exception due to their output column having sizes exceeding cudf limit. We want to differentiate if the thrown exception was really due to such reason (so we can split the input by half and try again), or due to something else (such as invalid input type or inconsistent data type etc). Unfortunately, currently only one type of exception (cudf::logic_error) is being thrown.

We want to have a new specific exception type with a name like cudf::output_size_exceed_limit. This exception type will be used to throw in the case mentioned above. Adding this new exception type is similar to #12426 which introduced a new exception type to throw when the input data type is invalid.

@vyasr
Copy link
Contributor

vyasr commented Mar 15, 2023

Please raise this issue for discussion on Slack or in the next team meeting. It would be great to consider whether there should be different exceptions for compound (strings) columns than for other columns too. I am not necessarily advocating this, but it's something to consider in case there are different ways that a caller could attempt to mitigate one scenario than the other (CC @davidwendt).

Depending on what decision is made, please update #12885 accordingly.

@ttnghia
Copy link
Contributor Author

ttnghia commented Mar 15, 2023

I think it is relevant not just for strings column. For example, we have the concatenate and interleave APIs that appends columns one after another, or interleaves rows of the input columns. If the total size of the input exceeds cudf limit, we can also throw the proposed exception above, indicating the exact failure reason. The interleave API can still split retry again with half size input.

@davidwendt
Copy link
Contributor

The cudf::concatenate API already uses std::overflow_error for this:
https://docs.rapids.ai/api/libcudf/nightly/group__copy__concatenate.html#ga8589afe8526e0ba8c4a149ab6cb58453

CUDF_EXPECTS(total_row_count <= static_cast<size_t>(std::numeric_limits<size_type>::max()),
"Total number of concatenated rows exceeds size_type range",
std::overflow_error);

I would prefer we use std::overflow_error exception for the output column size limit check rather than create a new cudf exception.

@ttnghia
Copy link
Contributor Author

ttnghia commented Mar 28, 2023

That can be fine, only still a minor issue if std::overflow_error is thrown by other code rather than our check for column size. But I assume that such issue will not show up very soon so let's use that std exception.

@davidwendt davidwendt self-assigned this Mar 28, 2023
@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Apr 2, 2023
@GregoryKimball GregoryKimball changed the title [FEA] Add new exception type to throw when an output colunn size exceeds cudf limit [FEA] Add new exception type to throw when an output column size exceeds cudf limit Apr 3, 2023
@GregoryKimball GregoryKimball changed the title [FEA] Add new exception type to throw when an output column size exceeds cudf limit [FEA] Throw std::overflow exception when an output column size exceeds cudf limit Apr 3, 2023
rapids-bot bot pushed a commit that referenced this issue May 24, 2023
…13323)

Replaces generic `cudf::logic_error` exception with `std::overflow_error` where appropriate in libcudf.
Since this changes what is thrown in certain APIs, I think this technically is a breaking change.

Closes #12925

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #13323
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants