Skip to content

Commit

Permalink
Use offsetalator in nvtext::tokenize_with_vocabulary (#15878)
Browse files Browse the repository at this point in the history
Updates the `token_counts_fn` kernel in the `nvtext::tokenize_with_vocabulary` to use the offsetalator instead of hardcoded `size_type` for accessing strings offsets.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #15878
  • Loading branch information
davidwendt authored May 30, 2024
1 parent bab0d80 commit 789cbfd
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions cpp/src/text/vocabulary_tokenize.cu
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,10 @@ CUDF_KERNEL void token_counts_fn(cudf::column_device_view const d_strings,
return;
}

auto const offsets =
d_strings.child(cudf::strings_column_view::offsets_column_index).data<cudf::size_type>();
auto const offset = offsets[str_idx + d_strings.offset()] - offsets[d_strings.offset()];
auto const chars_begin = d_strings.data<char>() + offsets[d_strings.offset()];
auto const offsets = d_strings.child(cudf::strings_column_view::offsets_column_index);
auto const offsets_itr = cudf::detail::input_offsetalator(offsets.head(), offsets.type());
auto const offset = offsets_itr[str_idx + d_strings.offset()] - offsets_itr[d_strings.offset()];
auto const chars_begin = d_strings.data<char>() + offsets_itr[d_strings.offset()];

auto const begin = d_str.data();
auto const end = begin + d_str.size_bytes();
Expand Down

0 comments on commit 789cbfd

Please sign in to comment.