Use offsetalator in nvtext::tokenize_with_vocabulary #15878

davidwendt · 2024-05-29T15:40:33Z

Description

Updates the token_counts_fn kernel in the nvtext::tokenize_with_vocabulary to use the offsetalator instead of hardcoded size_type for accessing strings offsets.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

davidwendt · 2024-05-30T19:02:33Z

/merge

Use offsetalator in nvtext::tokenize_with_vocabulary

f6a1855

davidwendt added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 29, 2024

davidwendt self-assigned this May 29, 2024

davidwendt requested a review from a team as a code owner May 29, 2024 15:40

davidwendt requested review from karthikeyann and vuule May 29, 2024 15:40

bdice approved these changes May 30, 2024

View reviewed changes

karthikeyann approved these changes May 30, 2024

View reviewed changes

rapids-bot bot merged commit 789cbfd into rapidsai:branch-24.08 May 30, 2024
70 checks passed

davidwendt deleted the vocab-offsetalator branch May 30, 2024 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use offsetalator in nvtext::tokenize_with_vocabulary #15878

Use offsetalator in nvtext::tokenize_with_vocabulary #15878

davidwendt commented May 29, 2024

davidwendt commented May 30, 2024

Use offsetalator in nvtext::tokenize_with_vocabulary #15878

Use offsetalator in nvtext::tokenize_with_vocabulary #15878

Conversation

davidwendt commented May 29, 2024

Description

Checklist

davidwendt commented May 30, 2024