Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Retry support for chunked Parquet writer #13042

Closed
ttnghia opened this issue Mar 30, 2023 · 0 comments · Fixed by #13076
Closed

[FEA] Retry support for chunked Parquet writer #13042

ttnghia opened this issue Mar 30, 2023 · 0 comments · Fixed by #13076
Assignees
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Mar 30, 2023

Similar to #12792, we need to support retry for Parquet writer. The changes to Parquet writer should be implemented in a very similar way as was done for ORC writer in #12949.

@ttnghia ttnghia added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. 0 - Blocked Cannot progress due to external reasons cuIO cuIO issue Spark Functionality that helps Spark RAPIDS labels Mar 30, 2023
@ttnghia ttnghia self-assigned this Mar 30, 2023
@ttnghia ttnghia added this to libcudf Mar 30, 2023
@ttnghia ttnghia removed the 0 - Blocked Cannot progress due to external reasons label Mar 31, 2023
@ttnghia ttnghia linked a pull request Apr 6, 2023 that will close this issue
rapids-bot bot pushed a commit that referenced this issue May 1, 2023
Similar to #12949, this refactors Parquet writer to support retry mechanism. The internal `writer::impl::write()` function is rewritten such that it is separated into multiple pieces:
 * A free function that performs compressing/encoding the input table into intermediate results. These intermediate results are totally independent of the writer.
 * After having the intermediate results in the previous step, these results will be actually applied to the output data sink to start the actual data writing.

Closes: 
 * #13042

Depends on:
 * #13206

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - https://github.com/nvdbaranec

URL: #13076
@GregoryKimball GregoryKimball removed this from libcudf Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants