-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support benchmarking kernels that cannot take an explicit stream #13
Comments
For prioritization, do you have a concrete need for this or is it a "nice to have"? |
I'd call it a P1.5 that we can round up to P2 :) |
We can add some optional API on I knew it was only a matter of time before I ran into the |
I'd like to solve this issue thus try to understand the solution here: My initial thought is to overload void my_benchmark(nvbench::state& state) {
state.exec([]() {
invoke_gpu_kernel(...); // a host API invoking GPU kernels but takes no stream argument
my_kernel<<<num_blocks, 256>>>(); // or launching a kernel with the default stream
});
}
NVBENCH_BENCH(my_benchmark); Here we assume that the default stream is known by Does the above general idea sound right? If so, how can we retrieve the default stream information if it's not explicitly specified (so it can be used by timers etc)? |
The
Rather than assume that This would look like:
This way, we can still support My plan for implementing this was:
For (1), this would be a new constructor:
For (2):
(3) will require removing the launcher's default constructor and replacing it with (4) will require modifying the Does that make sense? |
Since we're going with distinct types for owning/non-owning streams in libcu++ (and we'll likely eventually switch to those in nvbench), would it make more sense to also use distinct types in nvbench for now? |
I thought about it, but I'm not seeing any benefit that would justify the added complexity here. Using a single type simplifies the implementation considerably in this case. Once the libcu++ implementation is ready I'd consider switching if it makes sense and there's a good motivation to do so. |
From the example, nvbench expects all kernels to be executed on the stream provided by
launch.get_stream()
.This can be problematic when attempting to benchmark functions that contain kernel calls, but do not expose stream parameters (for one reason or another) on which those kernels should run. It would be nice to still be able to benchmark such functions.
The text was updated successfully, but these errors were encountered: