Add multiple rows to subword tokenizer benchmark #10767

davidwendt · 2022-05-02T13:12:01Z

When porting the subword tokenizer code from CLX, the benchmark was not updated to measure multiple rows. This updates the benchmark to include a row test range and add the missing cuda_event_timer.

codecov · 2022-05-02T14:23:32Z

Codecov Report

Merging #10767 (54aceed) into branch-22.06 (027c34a) will increase coverage by 0.02%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.06   #10767      +/-   ##
================================================
+ Coverage         86.40%   86.42%   +0.02%     
================================================
  Files               143      143              
  Lines             22444    22444              
================================================
+ Hits              19393    19398       +5     
+ Misses             3051     3046       -5

Impacted Files	Coverage Δ
python/cudf/cudf/core/column/numerical.py	`95.88% <0.00%> (-0.30%)`	⬇️
python/cudf/cudf/core/dataframe.py	`93.74% <0.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/column/string.py	`89.21% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.79% <0.00%> (+0.22%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
python/cudf/cudf/core/column/lists.py	`92.91% <0.00%> (+0.83%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6128e0d...54aceed. Read the comment docs.

codereport

lgtm

karthikeyann

Looks good.
This PR could also be an opportunity to move from gbench to NVBench.

davidwendt · 2022-05-02T18:06:17Z

Looks good. This PR could also be an opportunity to move from gbench to NVBench.

Yes, I considered that but I think I want to convert all of the text ones at the same time instead.

karthikeyann

LGTM 👍

davidwendt · 2022-05-03T11:39:48Z

@gpucibot merge

Add multiple rows to subword tokenizer benchmark

54aceed

davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 2, 2022

davidwendt self-assigned this May 2, 2022

davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 2, 2022

davidwendt marked this pull request as ready for review May 2, 2022 15:39

davidwendt requested a review from a team as a code owner May 2, 2022 15:39

davidwendt requested review from cwharris and codereport May 2, 2022 15:39

codereport approved these changes May 2, 2022

View reviewed changes

karthikeyann reviewed May 2, 2022

View reviewed changes

karthikeyann approved these changes May 2, 2022

View reviewed changes

rapids-bot bot merged commit 0e32624 into rapidsai:branch-22.06 May 3, 2022

davidwendt deleted the subword-benchmark branch May 3, 2022 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiple rows to subword tokenizer benchmark #10767

Add multiple rows to subword tokenizer benchmark #10767

davidwendt commented May 2, 2022

codecov bot commented May 2, 2022 •

edited

Loading

codereport left a comment

karthikeyann left a comment •

edited

Loading

davidwendt commented May 2, 2022

karthikeyann left a comment

davidwendt commented May 3, 2022

Add multiple rows to subword tokenizer benchmark #10767

Add multiple rows to subword tokenizer benchmark #10767

Conversation

davidwendt commented May 2, 2022

codecov bot commented May 2, 2022 • edited Loading

Codecov Report

codereport left a comment

Choose a reason for hiding this comment

karthikeyann left a comment • edited Loading

Choose a reason for hiding this comment

davidwendt commented May 2, 2022

karthikeyann left a comment

Choose a reason for hiding this comment

davidwendt commented May 3, 2022

codecov bot commented May 2, 2022 •

edited

Loading

karthikeyann left a comment •

edited

Loading