Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Parquet file writing with WriteBatch function #1028

Merged
merged 1 commit into from
Jan 18, 2022

Conversation

bmcdonald3
Copy link
Contributor

This PR changes from using the simple WriteTable function provided by the Parquet API to a lower-level WriteBatch function that allows greater control of writes. This optimization to Parquet writing in Arkodua is very similar to the reading optimization from #1014.

Performance numbers collected on chapcs:

Current This Branch
0.11 GiB/s 1.09 GiB/s

So we are seeing about a 10x improvement for writing Parquet files on chapcs.

Copy link
Collaborator

@reuster986 reuster986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't speak C++ very well, but I think I understand the logic. Great performance improvement!

@mhmerrill mhmerrill merged commit f68e9e2 into Bears-R-Us:master Jan 18, 2022
This was linked to issues Jan 20, 2022
@bmcdonald3 bmcdonald3 deleted the pq-write branch January 27, 2022 20:27
@bmcdonald3 bmcdonald3 mentioned this pull request Feb 10, 2022
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking Parquet follow up work Apache Parquet Support
5 participants