From 65b1cbdeda9cab57243d0a98e646c860ef86039e Mon Sep 17 00:00:00 2001 From: Karthikeyan <6488848+karthikeyann@users.noreply.github.com> Date: Wed, 20 Apr 2022 03:24:43 +0530 Subject: [PATCH] add data generation to benchmark documentation (#10677) add device data generation to benchmark documentation Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Conor Hoekstra (https://github.com/codereport) - Nghia Truong (https://github.com/ttnghia) URL: https://github.com/rapidsai/cudf/pull/10677 --- cpp/docs/BENCHMARKING.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/cpp/docs/BENCHMARKING.md b/cpp/docs/BENCHMARKING.md index 8794c90d1db..270e7a87e85 100644 --- a/cpp/docs/BENCHMARKING.md +++ b/cpp/docs/BENCHMARKING.md @@ -35,6 +35,12 @@ provided in `cpp/benchmarks/synchronization/synchronization.hpp` to help with th can also optionally clear the GPU L2 cache in order to ensure cache hits do not artificially inflate performance in repeated iterations. +## Data generation + +For generating benchmark input data, helper functions are available at [cpp/benchmarks/common/generate_input.hpp](/cpp/benchmarks/common/generate_input.hpp). The input data generation happens on device, in contrast to any `column_wrapper` where data generation happens on the host. +* `create_sequence_table` can generate sequence columns starting with value 0 in first row and increasing by 1 in subsequent rows. +* `create_random_table` can generate a table filled with random data. The random data parameters are configurable. + ## What should we benchmark? In general, we should benchmark all features over a range of data sizes and types, so that we can