Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40860: [GLib][Parquet] Add gparquet_arrow_file_writer_write_record_batch() #44001

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

kou
Copy link
Member

@kou kou commented Sep 7, 2024

Rationale for this change

We don't need to create a GArrowTable only for writing a GArrowRecordBatch.

What changes are included in this PR?

The following APIs are also added:

  • gparquet_arrow_file_writer_get_schema()
  • Parquet::ArrowFileWriter#write` in Ruby

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

Copy link

github-actions bot commented Sep 7, 2024

⚠️ GitHub issue #40860 has been automatically assigned in GitHub to PR creator.

…_record_batch()`

The following APIs are also added:
* `gparquet_arrow_file_writer_get_schema()`
* Parquet::ArrowFileWriter#write` in Ruby
@kou kou force-pushed the glib-parquet-write-record-batch branch from 9f31b2d to bab0ba3 Compare September 7, 2024 22:40
@kou
Copy link
Member Author

kou commented Sep 9, 2024

+1

@kou kou merged commit d88dd19 into apache:main Sep 9, 2024
9 checks passed
@kou kou deleted the glib-parquet-write-record-batch branch September 9, 2024 00:45
@kou kou removed the awaiting committer review Awaiting committer review label Sep 9, 2024
{
auto parquet_arrow_file_writer = gparquet_arrow_file_writer_get_raw(writer);
auto arrow_record_batch = garrow_record_batch_get_raw(record_batch).get();
auto status = parquet_arrow_file_writer->WriteRecordBatch(*arrow_record_batch);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for using this!

The writeRecordBatch is added by @wgtmac . Notice that this will not flush the RowGroup ( WriteTable will switch rg )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't notice it.
Should we also add the bindings of parquet::arrow::FileWriter::NewRowGroup() (and NewBufferedRowGroup()?) for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter, currently we can control the row-group flush by size, just a notice that they're different. NewRowGroup()/NewBufferedRowGroup is also ok in this scenerio

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK.
I should have read the WriteRecordBatch() implementation more before I asked.
It calls NewBufferedRowGroup() automatically:

RETURN_NOT_OK(NewBufferedRowGroup());

RETURN_NOT_OK(NewBufferedRowGroup());

Anyway, I'll add the bindings of them for advanced use cases: GH-44006, GH-44007

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the difference is that

RETURN_NOT_OK(NewRowGroup(size));

The writeTable will forcing NewRowGroup, but writeBatch will not...

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 9, 2024
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d88dd19.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

khwilson pushed a commit to khwilson/arrow that referenced this pull request Sep 14, 2024
…_record_batch()` (apache#44001)

### Rationale for this change

We don't need to create a `GArrowTable` only for writing a `GArrowRecordBatch`.

### What changes are included in this PR?

The following APIs are also added:
* `gparquet_arrow_file_writer_get_schema()`
* Parquet::ArrowFileWriter#write` in Ruby

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: apache#40860

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants