Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix skiprows issue with ORC Reader #7359

Merged

Conversation

rgsl888prabhu
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu commented Feb 9, 2021

closes #7343

The validity bits in streams are placed msb to lsb in a byte, [True, False, True. False. True, True, True, False] -> 10101110.
So, when it is being analyzed as 32 bit chunk, we can't apply mask directly, which caused this issue. __brev(__byte_perm(bits, 0, 0x0123)) takes care of that issue and rearranges the bits as per the expectation.

@rgsl888prabhu rgsl888prabhu added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer cuIO cuIO issue non-breaking Non-breaking change labels Feb 9, 2021
@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner February 9, 2021 22:43
@rgsl888prabhu rgsl888prabhu self-assigned this Feb 9, 2021
@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner February 9, 2021 22:43
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 9, 2021
python/cudf/cudf/tests/test_orc.py Outdated Show resolved Hide resolved
Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @rgsl888prabhu for the fix!

@rgsl888prabhu rgsl888prabhu requested a review from vuule February 10, 2021 21:56
cpp/src/io/orc/stripe_data.cu Outdated Show resolved Hide resolved
@@ -8,6 +8,7 @@
import pandas as pd
import pyarrow as pa
import pyarrow.orc
import pyorc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI fails because of this import

ModuleNotFoundError: No module named 'pyorc'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a PR for it rapidsai/integration#215

@rgsl888prabhu rgsl888prabhu requested a review from vuule February 11, 2021 16:24
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved once more, for good measure :)

@rgsl888prabhu
Copy link
Contributor Author

rerun tests

@codecov
Copy link

codecov bot commented Feb 15, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@eb8dc88). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff               @@
##             branch-0.19    #7359   +/-   ##
==============================================
  Coverage               ?   82.19%           
==============================================
  Files                  ?      100           
  Lines                  ?    16968           
  Branches               ?        0           
==============================================
  Hits                   ?    13947           
  Misses                 ?     3021           
  Partials               ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb8dc88...92e3e0c. Read the comment docs.

@harrism
Copy link
Member

harrism commented Feb 15, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a08ec0e into rapidsai:branch-0.19 Feb 15, 2021
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants