Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmarking guide to use NVBench. #10093

Merged
merged 3 commits into from
Jan 24, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions cpp/docs/BENCHMARKING.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# Unit Benchmarking in libcudf

Unit benchmarks in libcudf are written using [Google Benchmark](https://github.com/google/benchmark).
Unit benchmarks in libcudf are written using [NVBench](https://github.com/NVIDIA/nvbench).
While many existing benchmarks are written using
[Google Benchmark](https://github.com/google/benchmark), new benchmarks should use NVBench.

Google Benchmark provides many options for specifying ranges of parameters to benchmarks to test
with varying parameters, as well as to control the time unit reported, among other options. Refer to
other benchmarks in `cpp/benchmarks` to understand the options.
The NVBench library is similar to Google Benchmark, but has several quality of life improvements
when doing GPU benchmarking such as displaying the fraction of peak memory bandwidth achieved and
details about the GPU hardware.

Both NVBench and Google Benchmark provide many options for specifying ranges of parameters to
benchmark, as well as to control the time unit reported, among other options. Refer to existing
benchmarks in `cpp/benchmarks` to understand the options.

## Directory and File Naming

The naming of unit benchmark directories and source files should be consistent with the feature
being benchmarked. For example, the benchmarks for APIs in `copying.hpp` should live in
`cudf/cpp/benchmarks/copying`. Each feature (or set of related features) should have its own
`cpp/benchmarks/copying`. Each feature (or set of related features) should have its own
benchmark source file named `<feature>_benchmark.cu/cpp`. For example,
`cudf/cpp/src/copying/scatter.cu` has benchmarks in
`cudf/cpp/benchmarks/copying/scatter_benchmark.cu`.
`cpp/src/copying/scatter.cu` has benchmarks in
`cpp/benchmarks/copying/scatter_benchmark.cu`.
Copy link
Member

@PointKernel PointKernel Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been naming nvbench files with _nvbench.cu/cpp suffix as opposed to _benchmark.cu/cpp in gbench and executable with <feature>_NVBENCH (instead of <feature>_BENCH) to distinguish them from gbench. Not sure it's the right way to go for all nvbench benchmarks though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer no suffix on the source files. As nvbench becomes the primary benchmark it becomes just noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertmaynard Does "no suffix" mean "remove _benchmark" as well?

I find _benchmark helpful for finding files by name, but I agree that _nvbench is too specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall I decided to do so because of CMake:

  • ConfigureBench to build _benchmark.cu/cpp
  • ConfigureNVBench to build _nvbench.cu/cpp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertmaynard Does "no suffix" mean "remove _benchmark" as well?

I find _benchmark helpful for finding files by name, but I agree that _nvbench is too specific.

I was thinking of tool specific suffix. So benchmark seems okay to me. The purist in me says that we have benchmarks in the path to the file already so no future distinction is required ( we don't use scatter_header.h ).

I recall I decided to do so because of CMake:

  • ConfigureBench to build _benchmark.cu/cpp
  • ConfigureNVBench to build _nvbench.cu/cpp

ConfigureBench and ConfigureNVBench are given a set of souce files to compile, and don't care how those files are named. So they would have no problem with scatter_badly_named.cu :)

The primary issue is that we have join_nvbench.cu and join_benchmark.cu, where the nvbench version tests the left_join, full_join, and inner_join algorithms, and google benchmarks tests the semi and anti joins. If we had gone with file names that matched the algorithms ( anti_join_benchmark.cu, left_join_benchmark.cu ) we wouldn't need to have anything end with _nvbench.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we agree that no change is needed in the guide? (Perhaps some changes are needed to rename _nvbench.cu/cpp files.)

Copy link
Contributor

@karthikeyann karthikeyann Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to ideally lose suffixes.
Once our benchmarks become large enough to need to split across many files, we might need to add suffixes based on context of that benchmark. At that point, we would prefer to lose _nvbench, _benchmark suffixes because of long filenames if we had it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karthikeyann I can see the argument to remove _benchmark. However, that wouldn't align with the current state of the benchmarks. I'd like to keep this PR focused on the change to "use nvbench" and that change can be done at a later point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure when someone goes through and removes _benchmark from every benchmark source file, they also update the guide to not use the suffix. (If it were me, I would change the example in the guide now...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism I'll merge this PR and do that now. 👍


In the interest of improving compile time, whenever possible, test source files should be `.cpp`
files because `nvcc` is slower than `gcc` in compiling host code. Note that `thrust::device_vector`
Expand Down