-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implicit stream benchmarking support #76
Add implicit stream benchmarking support #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this quite a bit with rule-of-0 using a unique_ptr
with custom deleter.
Thanks for the PR! I'll take a look at this once I get the Thrust/CUB 1.16 release candidate prepped. Should be later this week if all goes well. |
@PointKernel Could you add an example for this feature in |
Also update to use `nvbench::make_cuda_stream_view`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a couple of commits that update the documentation and add a nvbench::make_cuda_stream_view
method. I think that's a clean solution to the non-owning API issue.
This looks good to me -- thanks for working on this @PointKernel and @jrhemstad!
docs/benchmarks.md
Outdated
NVBench records the elapsed time of work on a CUDA stream for each iteration of a benchmark. | ||
By default, NVBench creates and provides an explicit stream via `launch::get_stream()` | ||
to pass to every stream-ordered operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite true -- for the isolated/cold measurements, each iteration is recorded, but for the batch/hot measurements, several iterations are lumped together in a single timer.
I'd also move this down into it's own section -- this section is meant to give an extremely brief overview of a minimal benchmark specification and introduce key concepts. Using an explicit stream is an advanced usecase that should have it's own section.
I'll push a commit to this branch that restructures this a bit, since I'm pretty picky about these docs 😅
nvbench/detail/measure_cupti.cu
Outdated
try : m_state | ||
{ | ||
exec_state | ||
} | ||
, m_launch{m_state.get_cuda_stream()}, | ||
m_cupti{*m_state.get_device(), add_metrics(m_state)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. Understandably, clang-format is not a fan of initializer-scope try statements. I'll clean this up a bit in my follow up patch.
Closes #13
This PR adds support to benchmark functions that do not expose stream parameters.