Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a row index entry error in ORC writer issue (#10989) #11014

Merged

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 31, 2022

Issue #10755

Backporting the fix to 22.06

Fixes an issue in protobuf writer where the length on the row index entry was being written into a single byte. This would cause errors when the size is larger than 127.
The issue was uncovered when row group statistics were added. String statistics contain copies to min/max strings, so the size is unbounded.
This PR changes the protobuf writer to write the entry size as a generic uint, allowing larger entries.
Also fixed start_row in row group info array in the reader (unrelated).

Issue rapidsai#10755

Fixes an issue in protobuf writer where the length on the row index entry was being written into a single byte. This would cause errors when the size is larger than 127.
The issue was uncovered when row group statistics were added. String statistics contain copies to min/max strings, so the size is unbounded.
This PR changes the protobuf writer to write the entry size as a generic uint, allowing larger entries.
Also fixed `start_row` in row group info array in the reader (unrelated).

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - David Wendt (https://github.com/davidwendt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: rapidsai#10989
@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels May 31, 2022
@vuule vuule self-assigned this May 31, 2022
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels May 31, 2022
@codecov
Copy link

codecov bot commented May 31, 2022

Codecov Report

Merging #11014 (69fc6aa) into branch-22.06 (d0b4e30) will not change coverage.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           branch-22.06   #11014   +/-   ##
=============================================
  Coverage         86.32%   86.32%           
=============================================
  Files               144      144           
  Lines             22688    22688           
=============================================
  Hits              19585    19585           
  Misses             3103     3103           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0b4e30...69fc6aa. Read the comment docs.

@ajschmidt8 ajschmidt8 marked this pull request as ready for review May 31, 2022 22:50
@ajschmidt8 ajschmidt8 requested review from a team as code owners May 31, 2022 22:50
@ajschmidt8 ajschmidt8 requested review from shwina, brandon-b-miller, codereport and rgsl888prabhu and removed request for a team May 31, 2022 22:50
@ajschmidt8 ajschmidt8 merged commit 82c062a into rapidsai:branch-22.06 May 31, 2022
@vuule vuule deleted the bug-orc-index-entry-size-backport branch June 1, 2022 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants