Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup Parquet chunked writer #13094

Merged
merged 6 commits into from
Apr 12, 2023

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Apr 7, 2023

Similar to #13091, this changes the internal variables of Parquet chunked writer:

  • Renaming them to have a _ prefix consistently.
  • Add const qualifier to some variables that are writer parameters.
  • Regroup them.

There is not any new implementation added. However, the unused parameter mr is removed from its interface thus this is flagged as breaking changes.

Closes:

@ttnghia ttnghia added 3 - Ready for Review Ready for review by team code quality libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue improvement Improvement / enhancement to an existing function breaking Breaking change labels Apr 7, 2023
@ttnghia ttnghia self-assigned this Apr 7, 2023
@ttnghia ttnghia requested a review from a team as a code owner April 7, 2023 21:23
@ttnghia ttnghia requested a review from nvdbaranec April 7, 2023 21:24
Comment on lines -225 to -234
Compression compression_ = Compression::UNCOMPRESSED;
size_t max_row_group_size = default_row_group_size_bytes;
size_type max_row_group_rows = default_row_group_size_rows;
size_t max_page_size_bytes = default_max_page_size_bytes;
size_type max_page_size_rows = default_max_page_size_rows;
statistics_freq stats_granularity_ = statistics_freq::STATISTICS_NONE;
dictionary_policy dict_policy_ = dictionary_policy::ALWAYS;
size_t max_dictionary_size_ = default_max_dictionary_size;
bool int96_timestamps = false;
int32_t column_index_truncate_length = default_column_index_truncate_length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these defaults defined again in writer options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are assigned from outside options passed in by user, and never changed after the writer was constructed. Thus we don't need such default values here. These default values are used for writer options construction.

Copy link
Contributor Author

@ttnghia ttnghia Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttnghia ttnghia requested a review from karthikeyann April 11, 2023 22:43
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ttnghia
Copy link
Contributor Author

ttnghia commented Apr 12, 2023

/merge

@rapids-bot rapids-bot bot merged commit 2bf0b44 into rapidsai:branch-23.06 Apr 12, 2023
@ttnghia ttnghia deleted the cleanup_parquet_writer branch April 12, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Consistently use prefix/suffix for member variables of cudf::io::detail::parquet::writer::impl
3 participants