-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing trailing comma in json writer #12688
Fix missing trailing comma in json writer #12688
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-23.04 #12688 +/- ##
===============================================
Coverage ? 85.81%
===============================================
Files ? 158
Lines ? 25153
Branches ? 0
===============================================
Hits ? 21586
Misses ? 3567
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Do you have performance data for different chunk sizes?
@karthikeyann Is there a reason we don't use a larger chunk size? It appears that chunk size 512k performance is better than chunk size 256k. Are there other constraints like memory usage that balance against runtime performance? (Are "chunks" needed at all?) |
@bdice Yes. Memory usage is a major concern. chunk is not neeed. If we don't pass chunk size, it will convert entire table into string and will incur high memory usage (and may fail). Just to be safe, I used 256K. For example: In these perf measurements, I used a 2 column dataframe. Using 512K would be good too. I am trying to keep the |
Interesting! Let's make sure that there is guidance in the docs about this: |
See also this related request for CSV writing: #12690 (comment) |
json_writer_options const& options, | ||
rmm::cuda_stream_view stream, | ||
rmm::mr::device_memory_resource* mr) | ||
{ | ||
CUDF_EXPECTS(str_column_view.size() > 0, "Unexpected empty strings column."); | ||
|
||
string_scalar d_line_terminator{line_terminator}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was unused before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was used before too. Instead of passing std::string
as argument, now string_scalar
is passed.
/merge |
Description
Fix missing trailing comma in json writer for non-Lines json format.
Updates default rows_per_chunk because
8
is too small and writer is slow.closes #12687
Checklist