Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for encodings listed in the Parquet column chunk metadata #13907

Merged
merged 21 commits into from
Aug 26, 2023

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Aug 17, 2023

Description

With the addition of V2 page headers, the encodings used have also changed. This PR correctly determines the encodings used in each column chunk and writes that information to the column chunk metadata.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@etseidl etseidl requested a review from a team as a code owner August 17, 2023 21:06
@rapids-bot
Copy link

rapids-bot bot commented Aug 17, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 17, 2023
@GregoryKimball GregoryKimball added the cuIO cuIO issue label Aug 17, 2023
@vuule vuule added non-breaking Non-breaking change bug Something isn't working labels Aug 18, 2023
@vuule
Copy link
Contributor

vuule commented Aug 18, 2023

/ok to test

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big improvement on the way the encodings are handled 👍
What does this fix impact?

cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/tests/io/parquet_test.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/parquet_gpu.hpp Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved
cpp/tests/io/parquet_test.cpp Show resolved Hide resolved
Co-authored-by: Vukasin Milovanovic <[email protected]>
@etseidl
Copy link
Contributor Author

etseidl commented Aug 18, 2023

What does this fix impact?

Just correctness. As I dump files it's annoying to see encodings listed that aren't used. And this will help when the delta encodings come online.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, love the version without EncodingMask!

@vuule
Copy link
Contributor

vuule commented Aug 18, 2023

/ok to test

zhuoxunyi referenced this pull request Aug 23, 2023
Fixes: #13864 

This PR fixes an issue with `loc` indexer where some special handling needs to be done when `columns` is of type `MultiIndex`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #13929
@ttnghia
Copy link
Contributor

ttnghia commented Aug 26, 2023

/ok to test

@vuule
Copy link
Contributor

vuule commented Aug 26, 2023

/merge

@rapids-bot rapids-bot bot merged commit a025db5 into rapidsai:branch-23.10 Aug 26, 2023
@etseidl etseidl deleted the feature/fix_encodings branch August 26, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants