-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and report errors in Parquet header parsing #14237
Detect and report errors in Parquet header parsing #14237
Conversation
Note: we could also detect unsupported encodings earlier, but then that would make addressing #14209 a bit harder. |
/ok to test |
…detect_header_overrun
Co-authored-by: Vukasin Milovanovic <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a utility is born!
Looks good, just a few minor suggestions.
Co-authored-by: Vukasin Milovanovic <[email protected]>
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I have a few relatively small requests.
cpp/src/io/parquet/page_hdr.cu
Outdated
int lane_id = threadIdx.x % 32; | ||
int chunk = (blockIdx.x * 4) + (threadIdx.x / 32); | ||
byte_stream_s* const bs = &bs_g[threadIdx.x / 32]; | ||
int warp_id = threadIdx.x / 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While refactoring this, let's move to use cudf::detail::warp_size
instead of hardcoding 32
.
cpp/src/io/parquet/page_hdr.cu
Outdated
{ | ||
gpuParsePageHeader parse_page_header; | ||
__shared__ byte_stream_s bs_g[4]; | ||
|
||
int error[4] = {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be explicit about int
vs. int32_t
, for the purpose of aligning our conventions? I'd like to use the same type for the error
value as the type we use in the parameters of set_error
.
Co-authored-by: Bradley Dice <[email protected]>
…detect_header_overrun
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just a couple casts needed, I think.
Co-authored-by: Bradley Dice <[email protected]>
/ok to test |
/ok to test |
/ok to test |
/merge |
Description
Fixes #13656. Uses the error reporting introduced in #14167 to report errors in header parsing.
Checklist