-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nullmask offset handling in parquet and orc writer #6889
Fix nullmask offset handling in parquet and orc writer #6889
Conversation
} | ||
bitmask_type mask = | ||
__funnelshift_r(current_mask_word, next_mask_word, current_valid_offset); | ||
valid = 0xff & mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use existing function
cudf/cpp/src/bitmask/null_mask.cu
Lines 233 to 247 in 2498c58
__device__ bitmask_type get_mask_offset_word(bitmask_type const *__restrict__ source, | |
size_type destination_word_index, | |
size_type source_begin_bit, | |
size_type source_end_bit) | |
{ | |
size_type source_word_index = destination_word_index + word_index(source_begin_bit); | |
bitmask_type curr_word = source[source_word_index]; | |
bitmask_type next_word = 0; | |
if (word_index(source_end_bit) > | |
word_index(source_begin_bit + | |
destination_word_index * detail::size_in_bits<bitmask_type>())) { | |
next_word = source[source_word_index + 1]; | |
} | |
return __funnelshift_r(curr_word, next_word, source_begin_bit); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be reused but it should be changed to an inline function and moved to a cuh file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't affect performance we can use reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the get_mask_offset_word
function to column_device_view.cuh
and inlined it so that it can be reused here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic of the change looks good to me, for the most part. Some minor suggestions, mostly related to the clean up of the existing code.
cpp/src/io/orc/dict_enc.cu
Outdated
(row + 32 < s->chunk.start_row + s->chunk.num_rows) ? valid_map[(row >> 5) + 1] : 0; | ||
uint32_t v1 = (row + 32 < s->chunk.start_row + s->chunk.num_rows) | ||
? valid_map[((row + s->chunk.column_offset) >> 5) + 1] | ||
: 0; | ||
v = __funnelshift_r(v, v1, row & 0x1f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devavret Shouldn't this be changed to
v = __funnelshift_r(v, v1, (row + s->chunk.column_offset) & 0x1f);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears so. Also it seems the wrapping (& 0x1f
) is redundant?
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #6889 +/- ##
===============================================
+ Coverage 81.58% 82.46% +0.87%
===============================================
Files 96 96
Lines 15920 16969 +1049
===============================================
+ Hits 12989 13994 +1005
- Misses 2931 2975 +44
Continue to review full report at Codecov.
|
Closes #5247 Adds `agg` function for DataFrame Authors: - Sheilah Kirui <[email protected]> - Sheilah Kirui <[email protected]> - Michael Wang <[email protected]> - skirui-source <[email protected]> - galipremsagar <[email protected]> - GALI PREM SAGAR <[email protected]> - Keith Kraus <[email protected]> - Ashwin Srinath <[email protected]> Approvers: - Michael Wang - Michael Wang - Keith Kraus URL: #6483
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like something went wrong with the merge, dataframe.py and test_dataframe.py come from a another PR
rerun tests |
} | ||
bitmask_type mask = | ||
__funnelshift_r(current_mask_word, next_mask_word, current_valid_offset); | ||
valid = 0xff & mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't affect performance we can use reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest looks good to me, just had q small query
Fixes #6642