Skip to content

Commit

Permalink
Generate benchmark data with correct run length regardless of cardina…
Browse files Browse the repository at this point in the history
…lity (rapidsai#11205)

Issue rapidsai#11204 

The new GPU-accelerated data generator does not account for run length when cardinality is not set.
This PR changes the logic so that the columns with correct average run length are generated even without specific cardinality.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - David Wendt (https://github.com/davidwendt)
  - Nghia Truong (https://github.com/ttnghia)
  - https://github.com/nvdbaranec

URL: rapidsai#11205
  • Loading branch information
vuule authored Jul 7, 2022
1 parent 008f1d5 commit 58f46a6
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions cpp/benchmarks/common/generate_input.cu
Original file line number Diff line number Diff line change
Expand Up @@ -397,19 +397,21 @@ std::unique_ptr<cudf::column> create_random_column(data_profile const& profile,
random_value_fn<bool>(distribution_params<bool>{1. - profile.get_null_frequency().value_or(0)});
auto value_dist = random_value_fn<T>{profile.get_distribution_params<T>()};

auto const cardinality = std::min(num_rows, profile.get_cardinality());
rmm::device_uvector<bool> samples_null_mask = valid_dist(engine, cardinality);
rmm::device_uvector<T> samples = value_dist(engine, cardinality);

// Distribution for picking elements from the array of samples
auto const avg_run_len = profile.get_avg_run_length();
rmm::device_uvector<T> data(0, cudf::default_stream_value);
rmm::device_uvector<bool> null_mask(0, cudf::default_stream_value);

if (cardinality == 0) {
if (profile.get_cardinality() == 0 and avg_run_len == 1) {
data = value_dist(engine, num_rows);
null_mask = valid_dist(engine, num_rows);
} else {
auto const cardinality = [profile_cardinality = profile.get_cardinality(), num_rows] {
return (profile_cardinality == 0 or profile_cardinality > num_rows) ? num_rows
: profile_cardinality;
}();
rmm::device_uvector<bool> samples_null_mask = valid_dist(engine, cardinality);
rmm::device_uvector<T> samples = value_dist(engine, cardinality);
// generate n samples and gather.
auto const sample_indices =
sample_indices_with_run_length(avg_run_len, cardinality, num_rows, engine);
Expand Down

0 comments on commit 58f46a6

Please sign in to comment.