-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ORC writer output corruption with string columns #7565
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7565 +/- ##
===============================================
+ Coverage 81.86% 82.38% +0.51%
===============================================
Files 101 101
Lines 16884 17339 +455
===============================================
+ Hits 13822 14284 +462
+ Misses 3062 3055 -7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool bug in a bad way. Glad this one was found and fixed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requesting changes because I believe this can be better addressed when we start using column_device_view
for orc instead of the current column_data_base
and valid_map_base
pair of pointers.
} | ||
s->scratch_red[t] = v; | ||
s->scratch_red[t] = valid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole block looks to be over optimized. It's pre-loading an offseted mask word into scratch_red
only to have individual threads treating this shared memory as the new mask, and then doing an exclusive sum over the validity values.
This could be achieved using an iterator for validity, directly accessing the mask itself. and that iterator could be given an appropriate offset, which I believe would be the column offset + this dictionary chunk's start row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I also had a few ideas on what can be simplified around this code, but wanted to keep such changes separate from the actual fix.
@gpucibot merge |
Closes: rapidsai#7346 Fixes an issue in ORC writer where null counting would not read the mask for every row. The issue occurs when the column offset is not divisible by 32 so that two words are always read to get 32bits of mask (each read is effectively offset by the columns offset, so when reading the mask for 32 rows, we need to get two words to account for the offset). Namely, the second word is not read when the row is closer than 32 to the end of the chunk. This condition is incorrect for most column offsets, as the current row is not really the first bit of the mask word. The fix is to adjust the condition when the second mask word is read (assuming that mask in padded to multiple of 32). Authors: - Vukasin Milovanovic (@vuule) Approvers: - @nvdbaranec - Mike Wilson (@hyperbolic2346) - Devavret Makkar (@devavret) URL: rapidsai#7565
Closes: #7346
Fixes an issue in ORC writer where null counting would not read the mask for every row.
The issue occurs when the column offset is not divisible by 32 so that two words are always read to get 32bits of mask (each read is effectively offset by the columns offset, so when reading the mask for 32 rows, we need to get two words to account for the offset). Namely, the second word is not read when the row is closer than 32 to the end of the chunk. This condition is incorrect for most column offsets, as the current row is not really the first bit of the mask word.
The fix is to adjust the condition when the second mask word is read (assuming that mask in padded to multiple of 32).