Skip to content

Commit

Permalink
Use cudf_test temp_directory class for nvtext::subword_tokenize gbenc…
Browse files Browse the repository at this point in the history
…hmark (#14558)

Changes the creation of a temporary subword hash file to use the `temp_directory` class from `cudf_test/file_utilities.hpp`.
This is part of an overall effort to consolidate and document libcudf environment variables.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #14558
  • Loading branch information
davidwendt authored Dec 6, 2023
1 parent ecfb939 commit c0538f1
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions cpp/benchmarks/text/subword.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,10 @@
#include <benchmarks/synchronization/synchronization.hpp>

#include <cudf_test/column_wrapper.hpp>
#include <cudf_test/file_utilities.hpp>

#include <cudf/strings/strings_column_view.hpp>

#include <nvtext/subword_tokenize.hpp>

#include <filesystem>
Expand All @@ -29,8 +31,8 @@

static std::string create_hash_vocab_file()
{
std::string dir_template{std::filesystem::temp_directory_path().string()};
if (char const* env_p = std::getenv("WORKSPACE")) dir_template = env_p;
static temp_directory const subword_tmpdir{"cudf_gbench"};
auto dir_template = subword_tmpdir.path();
std::string hash_file = dir_template + "/hash_vocab.txt";
// create a fake hashed vocab text file for this test
// this only works with words in the strings in the benchmark code below
Expand All @@ -57,7 +59,7 @@ static void BM_subword_tokenizer(benchmark::State& state)
auto const nrows = static_cast<cudf::size_type>(state.range(0));
std::vector<char const*> h_strings(nrows, "This is a test ");
cudf::test::strings_column_wrapper strings(h_strings.begin(), h_strings.end());
std::string hash_file = create_hash_vocab_file();
static std::string hash_file = create_hash_vocab_file();
std::vector<uint32_t> offsets{14};
uint32_t max_sequence_length = 64;
uint32_t stride = 48;
Expand Down

0 comments on commit c0538f1

Please sign in to comment.