You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The libcudf CSV writer throws a std::overflow_error when trying to file a dataframe or a chunk that exceeds 2.1B characters.
Describe the solution you'd like
There are a few options:
fix the overflow error in the CSV writer implementation. This would be a libcudf change. Since the CSV writer is fairly simple and relies on libcudf strings API, there may be a straightforward solution.
Change the default chunksize to libcudf or cuDF from std::numeric_limits<size_type>::max(); to something based on the dataframe size, e.g. chunksize = len(df) // (df.memory_usage(deep=True).sum() / 500_000_000.)
Additional context
As far as fixing the root cause, here is a quick repro:
File "csv.pyx", line 547, in cudf._lib.csv.write_csv
OverflowError: Writing CSV file with chunksize=3000000 failed. Consider providing a smaller chunksize argument.
but I couldn't immediately find where the std::overflow_error is getting thrown in io/csv/writer_impl.cu.
As far as chunksize, I did some analysis of chunksize, and using smaller chunks of ~100 MB shouldn't have a significant performance impact.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The libcudf CSV writer throws a
std::overflow_error
when trying to file a dataframe or a chunk that exceeds 2.1B characters.Describe the solution you'd like
There are a few options:
std::numeric_limits<size_type>::max();
to something based on the dataframe size, e.g.chunksize = len(df) // (df.memory_usage(deep=True).sum() / 500_000_000.)
Additional context
As far as fixing the root cause, here is a quick repro:
which throws:
but I couldn't immediately find where the
std::overflow_error
is getting thrown inio/csv/writer_impl.cu
.As far as chunksize, I did some analysis of chunksize, and using smaller chunks of ~100 MB shouldn't have a significant performance impact.
The text was updated successfully, but these errors were encountered: