-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gbenchmark for nvtext replace-tokens function #7708
Add gbenchmark for nvtext replace-tokens function #7708
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7708 +/- ##
===============================================
+ Coverage 81.86% 82.50% +0.64%
===============================================
Files 101 101
Lines 16884 17441 +557
===============================================
+ Hits 13822 14390 +568
+ Misses 3062 3051 -11
Continue to review full report at Codecov.
|
std::string row; // build a row of random tokens | ||
while (static_cast<int>(row.size()) < n_length) row += words[tokens_dist(generator)]; | ||
|
||
std::uniform_int_distribution<int> position_dist(0, 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, why 16
? Is this a good size test? Do we need to benchmark if each string is of size around 1000
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an arbitrary power of two number less than 32 to force some amount of warp divergence.
This benchmark tests with string lengths ranging from 32 to 8K and also some row sizes between 4K to 16M (within limits of column size boundaries).
@gpucibot merge |
Reference #5696
Creates gbenchmarks for
nvtext::replace_tokens()
function.The benchmarks measures various string lengths and number of rows with the default whitespace delimiter and 4 hardcoded tokens.
This API already uses the
make_strings_children
utility.