Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Parquet reader handling of V2 page header info #13775

Merged
merged 12 commits into from
Aug 15, 2023

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Jul 26, 2023

Description

This PR replaces the def_lvl_bytes and rep_lvl_bytes fields of the gpu::PageInfo struct with an array indexed by gpu::level_type (as is done with the lvl_decode_buf array). This allows for some streamlining in InitLevelSection(), removing some redundant code and improving readability.

See this comment for context.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@etseidl etseidl requested a review from a team as a code owner July 26, 2023 23:08
@rapids-bot
Copy link

rapids-bot bot commented Jul 26, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 26, 2023
} else {
init_rle(cur, cur + len);
}
init_rle(cur, cur + len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have such big change here? Will this a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because now I set and test len in the if, so len is guaranteed non-zero and I can get rid of the redundant branching for the len==0 case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do we now hit the else branch at 915 instead? I see similar code with the branch that's removed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. The original fix from #13707 created a duplicate path to get to 0 bits for the level RLE encoding. This refactor gets rid of the duplicate.

@ttnghia
Copy link
Contributor

ttnghia commented Aug 10, 2023

/ok to test

@ttnghia ttnghia added 3 - Ready for Review Ready for review by team cuIO cuIO issue non-breaking Non-breaking change labels Aug 10, 2023
@davidwendt davidwendt added the improvement Improvement / enhancement to an existing function label Aug 10, 2023
@ttnghia
Copy link
Contributor

ttnghia commented Aug 11, 2023

/ok to test

@vuule
Copy link
Contributor

vuule commented Aug 15, 2023

/ok to test

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Aug 15, 2023
@vuule
Copy link
Contributor

vuule commented Aug 15, 2023

/merge

@rapids-bot rapids-bot bot merged commit 20c3aab into rapidsai:branch-23.10 Aug 15, 2023
@etseidl etseidl deleted the feature/page_refactor branch August 15, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants