-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out-of-bounds access in ORC writer #7902
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.20 #7902 +/- ##
===============================================
+ Coverage 82.30% 82.72% +0.42%
===============================================
Files 101 103 +2
Lines 17053 17705 +652
===============================================
+ Hits 14035 14647 +612
- Misses 3018 3058 +40
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of suggestions to secure things a bit tighter.
cpp/src/io/orc/orc.h
Outdated
private: | ||
// ORC column id (different from column index in the table!) | ||
// Zero means no corresponding column in the table | ||
uint32_t _column_id = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this break any logic below that may have been expecting ~0
? This stuff:
if (stream.column >= orc2gdf.size()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the logic to separate "0" column id from no column id, should be consistent with the original logic
ff.types[0].subtypes[column.id()] = 1 + column.id(); | ||
ff.types[0].fieldNames[column.id()] = column.orc_name(); | ||
ff.types[column.id()].kind = column.orc_kind(); | ||
ff.types[0].subtypes[column.index()] = column.id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks confusing, because id()
has been renamed to index()
, and id()
has then been added (returns index+1).
@nvdbaranec thank you for the quick reviews! |
@gpucibot merge |
Avoid out-of-bounds access due to streams holding column ids as
index + 1
, and the first index stream using zero for its column id. In a few places the corresponding column is accessed as[column_id - 1]
, even when the id is zero.Other changes:
Small refactoring of ORC streams creation. and stream offset computation.