-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add partitioning support to Parquet chunked writer (#10000)
Chunked writer (`class ParquetWriter`) now takes an argument `partition_cols`. For each call to `write_table(df)`, the `df` is partitioned and the parts are appended to the same corresponding file in the dataset directory. This can be used when partitioning is desired but when one wants to avoid making many small files in each sub directory e.g. Instead of repeated call to `write_to_dataset` like so: ```python write_to_dataset(df1, root_path, partition_cols=['group']) write_to_dataset(df2, root_path, partition_cols=['group']) ... ``` which will yield the following structure ``` root_dir/ group=value1/ <uuid1>.parquet <uuid2>.parquet ... group=value2/ <uuid1>.parquet <uuid2>.parquet ... ... ``` One can write with ```python pw = ParquetWriter(root_path, partition_cols=['group']) pw.write_table(df1) pw.write_table(df2) pw.close() ``` to get the structure ``` root_dir/ group=value1/ <uuid1>.parquet group=value2/ <uuid1>.parquet ... ``` Closes #7196 Also workaround fixes fixes #9216 fixes #7011 TODO: - [x] Tests Authors: - Devavret Makkar (https://github.com/devavret) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Ashwin Srinath (https://github.com/shwina) URL: #10000
- Loading branch information
Showing
4 changed files
with
350 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.