Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implicit stream benchmarking support #76

Merged
merged 18 commits into from
Feb 11, 2022

Conversation

PointKernel
Copy link
Member

Closes #13

This PR adds support to benchmark functions that do not expose stream parameters.

Copy link
Collaborator

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this quite a bit with rule-of-0 using a unique_ptr with custom deleter.

docs/benchmarks.md Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
docs/benchmarks.md Outdated Show resolved Hide resolved
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
@alliepiper
Copy link
Collaborator

Thanks for the PR! I'll take a look at this once I get the Thrust/CUB 1.16 release candidate prepped. Should be later this week if all goes well.

@alliepiper alliepiper self-requested a review February 7, 2022 16:52
@alliepiper alliepiper self-assigned this Feb 7, 2022
@alliepiper alliepiper added type: enhancement New feature or request. helps: rapids Helps or needed by RAPIDS. P1: should have Necessary, but not critical. labels Feb 7, 2022
@alliepiper
Copy link
Collaborator

@PointKernel Could you add an example for this feature in nvbench/examples?

Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a couple of commits that update the documentation and add a nvbench::make_cuda_stream_view method. I think that's a clean solution to the non-owning API issue.

This looks good to me -- thanks for working on this @PointKernel and @jrhemstad!

Comment on lines 14 to 16
NVBench records the elapsed time of work on a CUDA stream for each iteration of a benchmark.
By default, NVBench creates and provides an explicit stream via `launch::get_stream()`
to pass to every stream-ordered operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite true -- for the isolated/cold measurements, each iteration is recorded, but for the batch/hot measurements, several iterations are lumped together in a single timer.

I'd also move this down into it's own section -- this section is meant to give an extremely brief overview of a minimal benchmark specification and introduce key concepts. Using an explicit stream is an advanced usecase that should have it's own section.

I'll push a commit to this branch that restructures this a bit, since I'm pretty picky about these docs 😅

Comment on lines 172 to 177
try : m_state
{
exec_state
}
, m_launch{m_state.get_cuda_stream()},
m_cupti{*m_state.get_device(), add_metrics(m_state)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Understandably, clang-format is not a fan of initializer-scope try statements. I'll clean this up a bit in my follow up patch.

@alliepiper alliepiper merged commit 38cecd5 into NVIDIA:main Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS. P1: should have Necessary, but not critical. type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support benchmarking kernels that cannot take an explicit stream
3 participants