-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce useful guidance on overflow error in to_csv
#12705
Produce useful guidance on overflow error in to_csv
#12705
Conversation
Partial progress towards rapidsai#10200, this will enable catching and re-raising a useful overflow message in to_csv if the requested dataframe write cannot be converted to a single string column without overflow.
5b6efeb
to
5184a8a
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-23.04 #12705 +/- ##
===============================================
Coverage ? 85.81%
===============================================
Files ? 158
Lines ? 25146
Branches ? 0
===============================================
Hits ? 21578
Misses ? 3568
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python changes look good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
/merge |
There is one more test to fix as found here: |
Thanks @davidwendt, will do! |
Description
Since writing to CSV files is implemented by converting all columns in
a dataframe to strings, and then concatenating those columns, when we
attempt to write a large dataframe to CSV without specifying a chunk
size, we can easily overflow the maximum column size.
Currently the error message is rather inscrutable: that the requested
size of a string column exceeds the column size limit. To help the
user, catch this error and provide a useful error message that points
them towards setting the
chunksize
argument.So that we don't produce false positive advice, tighten the scope by
only catching
OverflowError
, to do this, make partial progresstowards resolving #10200 by throwing
std::overflow_error
whenchecking for overflow of string column lengths.
Closes #12690.