-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40860: [GLib][Parquet] Add gparquet_arrow_file_writer_write_record_batch()
#44001
Conversation
|
…_record_batch()` The following APIs are also added: * `gparquet_arrow_file_writer_get_schema()` * Parquet::ArrowFileWriter#write` in Ruby
9f31b2d
to
bab0ba3
Compare
+1 |
{ | ||
auto parquet_arrow_file_writer = gparquet_arrow_file_writer_get_raw(writer); | ||
auto arrow_record_batch = garrow_record_batch_get_raw(record_batch).get(); | ||
auto status = parquet_arrow_file_writer->WriteRecordBatch(*arrow_record_batch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for using this!
The writeRecordBatch is added by @wgtmac . Notice that this will not flush the RowGroup ( WriteTable will switch rg )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't notice it.
Should we also add the bindings of parquet::arrow::FileWriter::NewRowGroup()
(and NewBufferedRowGroup()
?) for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter, currently we can control the row-group flush by size, just a notice that they're different. NewRowGroup()/NewBufferedRowGroup is also ok in this scenerio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK.
I should have read the WriteRecordBatch()
implementation more before I asked.
It calls NewBufferedRowGroup()
automatically:
arrow/cpp/src/parquet/arrow/writer.cc
Line 425 in d88dd19
RETURN_NOT_OK(NewBufferedRowGroup()); |
arrow/cpp/src/parquet/arrow/writer.cc
Line 468 in d88dd19
RETURN_NOT_OK(NewBufferedRowGroup()); |
Anyway, I'll add the bindings of them for advanced use cases: GH-44006, GH-44007
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the difference is that
arrow/cpp/src/parquet/arrow/writer.cc
Line 382 in d88dd19
RETURN_NOT_OK(NewRowGroup(size)); |
The writeTable will forcing NewRowGroup, but writeBatch will not...
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d88dd19. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…_record_batch()` (apache#44001) ### Rationale for this change We don't need to create a `GArrowTable` only for writing a `GArrowRecordBatch`. ### What changes are included in this PR? The following APIs are also added: * `gparquet_arrow_file_writer_get_schema()` * Parquet::ArrowFileWriter#write` in Ruby ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * GitHub Issue: apache#40860 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Rationale for this change
We don't need to create a
GArrowTable
only for writing aGArrowRecordBatch
.What changes are included in this PR?
The following APIs are also added:
gparquet_arrow_file_writer_get_schema()
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.
gparquet_arrow_file_writer_write_record_batch()
#40860