-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implicit stream benchmarking support #76
Merged
alliepiper
merged 18 commits into
NVIDIA:main
from
PointKernel:add-implicit-stream-support
Feb 11, 2022
Merged
Changes from 14 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
15f2e92
Add owning and non-owning semantics to nvbench::cuda_stream
PointKernel 8aea3e4
Add a cuda stream member to nvbench::state
PointKernel c510a0e
Update launch to hold a const ref of nvbenc::cuda_stream
PointKernel 14eab07
Update measure_* classes to construct launch from the state cuda stream
PointKernel 86708ec
Fix a stream destroy bug
PointKernel 439ffec
Minor correction
PointKernel 470beda
Add nvbench::state stream tests
PointKernel 76cbbcc
Update benchmarks.md
PointKernel 33a896f
Update copyright year
PointKernel a2a12c6
Update docs/benchmarks.md
PointKernel e7c29c1
Update docs
PointKernel e05bf00
Use unique_ptr + custom deleter to simplify destroy logic
PointKernel 6159d9c
Minor correction in unit test
PointKernel fde2e40
Add stream benchmark example
PointKernel da2ec38
Exclude some bits from clang-format.
alliepiper 8ae5898
Add docs for launch and cuda_stream.
alliepiper 3b41387
Add `nvbench::make_cuda_stream_view(cudaStream_t)`.
alliepiper 039d455
Move documentation on streams to new subsection.
alliepiper File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
/* | ||
* Copyright 2022 NVIDIA Corporation | ||
* | ||
* Licensed under the Apache License, Version 2.0 with the LLVM exception | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. | ||
* | ||
* You may obtain a copy of the License at | ||
* | ||
* http://llvm.org/foundation/relicensing/LICENSE.txt | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#include <nvbench/nvbench.cuh> | ||
|
||
// Grab some testing kernels from NVBench: | ||
#include <nvbench/test_kernels.cuh> | ||
|
||
// Thrust vectors simplify memory management: | ||
#include <thrust/device_vector.h> | ||
|
||
// A function to benchmark but does not expose an explicit stream argument. | ||
void copy(int32_t *input, int32_t *output, std::size_t const num_values) | ||
{ | ||
nvbench::copy_kernel<<<256, 256>>>(input, output, num_values); | ||
} | ||
|
||
// `stream_bench` copies a 64 MiB buffer of int32_t on a CUDA stream specified | ||
// by the user. | ||
// | ||
// By default, NVBench creates and provides an explicit stream via | ||
// `launch::get_stream()` to pass to every stream-ordered operation. Sometimes | ||
// it is inconvenient or impossible to specify an explicit CUDA stream to every | ||
// stream-ordered operation. In this case, users may specify a target stream via | ||
// `state::set_cuda_stream`. It is assumed that all work of interest executes on | ||
// or synchronizes with this stream. | ||
void stream_bench(nvbench::state &state) | ||
{ | ||
// Allocate input data: | ||
const std::size_t num_values = 64 * 1024 * 1024 / sizeof(nvbench::int32_t); | ||
thrust::device_vector<nvbench::int32_t> input(num_values); | ||
thrust::device_vector<nvbench::int32_t> output(num_values); | ||
|
||
// Set CUDA default stream as the target stream. Note the default stream | ||
// is non-owning. | ||
cudaStream_t default_stream = 0; | ||
state.set_cuda_stream( | ||
nvbench::cuda_stream{default_stream, false /*owning = false*/}); | ||
|
||
state.exec([&input, &output, num_values](nvbench::launch &) { | ||
copy(thrust::raw_pointer_cast(input.data()), | ||
thrust::raw_pointer_cast(output.data()), | ||
num_values); | ||
}); | ||
} | ||
|
||
NVBENCH_BENCH(stream_bench); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
/* | ||
* Copyright 2021 NVIDIA Corporation | ||
* Copyright 2021-2022 NVIDIA Corporation | ||
* | ||
* Licensed under the Apache License, Version 2.0 with the LLVM exception | ||
* (the "License"); you may not use this file except in compliance with | ||
|
@@ -169,7 +169,12 @@ std::vector<std::string> add_metrics(nvbench::state &state) | |
} // namespace | ||
|
||
measure_cupti_base::measure_cupti_base(state &exec_state) | ||
try : m_state{exec_state}, m_cupti(*m_state.get_device(), add_metrics(m_state)) | ||
try : m_state | ||
{ | ||
exec_state | ||
} | ||
, m_launch{m_state.get_cuda_stream()}, | ||
m_cupti{*m_state.get_device(), add_metrics(m_state)} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Heh. Understandably, clang-format is not a fan of initializer-scope try statements. I'll clean this up a bit in my follow up patch. |
||
{} | ||
catch (const std::exception &ex) | ||
{ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite true -- for the isolated/cold measurements, each iteration is recorded, but for the batch/hot measurements, several iterations are lumped together in a single timer.
I'd also move this down into it's own section -- this section is meant to give an extremely brief overview of a minimal benchmark specification and introduce key concepts. Using an explicit stream is an advanced usecase that should have it's own section.
I'll push a commit to this branch that restructures this a bit, since I'm pretty picky about these docs 😅