Skip to content

Commit

Permalink
Fix string length in stripe dictionary building (#7744)
Browse files Browse the repository at this point in the history
In PR #7676 the length of the current string being referred to while building stripe dictionaries was always set to 0 while incrementing the dictionary character count of a StripeDictionary. This led to corrupted strings when the dictionary encoding was used as noted in issue #7741. This has been fixed in this PR.

Fixes #7741

Authors:
  - Kumar Aatish (@kaatish)

Approvers:
  - Vukasin Milovanovic (@vuule)
  - Nghia Truong (@ttnghia)

URL: #7744
  • Loading branch information
kaatish authored Mar 27, 2021
1 parent b0586c4 commit add4b45
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion cpp/src/io/orc/dict_enc.cu
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,10 @@ __global__ void __launch_bounds__(block_size)
uint32_t cur = (i + t < num_strings) ? dict_data[i + t] : 0;
uint32_t cur_len = 0;
bool is_dupe = false;
if (i + t < num_strings) { current_string = s->stripe.leaf_column->element<string_view>(cur); }
if (i + t < num_strings) {
current_string = s->stripe.leaf_column->element<string_view>(cur);
cur_len = current_string.size_bytes();
}
if (i + t != 0 && i + t < num_strings) {
uint32_t prev = dict_data[i + t - 1];
is_dupe = (current_string == (s->stripe.leaf_column->element<string_view>(prev)));
Expand Down

0 comments on commit add4b45

Please sign in to comment.