-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify compression type in Parquet reader #10610
Verify compression type in Parquet reader #10610
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10610 +/- ##
================================================
+ Coverage 86.30% 86.34% +0.03%
================================================
Files 140 140
Lines 22255 22280 +25
================================================
+ Hits 19207 19237 +30
+ Misses 3048 3043 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly minor comment, usually we don't propagate libcudf errors directly to the user from python. Is there a way the reader code can query the codec before it hits libcudf and error there?
The current design looks fine to me. We pass quite a few low-level errors from libcudf, to my knowledge (perhaps especially in I/O code?). I would avoid re-implementing this codec check in Python if we can let it fail in C++ and bubble through Cython's exception handling. The exception would be cases that need to be pre-emptively stopped at a higher layer (Python), but that doesn't seem to apply here. |
Thank you for the comments! Unfortunately, there's no cheap way to catch this error in Python. I don't think we can avoid propagating the C++ exception here. |
@gpucibot merge |
Closes #10602
This PR adds a compression type check for each chunk in the input file.
Reader throws in an unsupported compression is used.