-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multi stream allocations benchmark. #841
add multi stream allocations benchmark. #841
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this. Looks good. Just some suggestions on how to make it easier to use for profiling and easier to extend to other MRs in the future.
benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cu
Outdated
Show resolved
Hide resolved
benchmarks/multi_stream_allocations/multi_stream_allocations_bench.cu
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it.
…eam_allocations_bench
std::cout << "Error: invalid memory_resource name: " << name << std::endl; | ||
} | ||
|
||
void run_profile(std::string resource_name, int kernel_count, int stream_count, bool prewarm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Can't you do the same thing via GBench command line args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by "this". gbench runs multiple times no matter what. We want a way to run only once for profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a way to run only once for profiling.
Right, that's what I meant. I thought there was a "num_iterations` gbench option.
Related NVIDIA/nvbench#10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found an option in gbench to limit the number of iterations. I looked briefly prior to implementing it this way, but gbench documentation is... not great?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gbenchmark does not have a way to control number of iterations. Here's one of the authors explanations of why. https://stackoverflow.com/a/61888885/749748
@cwharris failing cmake style |
@gpucibot merge |
This PR introduces a new benchmark to measure the effect of the pool allocator in situations which require concurrent kernel execution. The benchmark reveals that the pool allocator prevents concurrent kernel execution in multiple non-default streams unless the pool allocator already has enough memory reserved for each stream prior to any allocation attempts.
legend:
BM_MultiStreamAllocations/<threads in pool>/<kernels to launch>/<prewarm>
The benchmark works by running a compute-bound kernel in a very small launch configuration (1 block 1 thread) multiple times against a stream pool. As the size of the stream pool increases, one should expect the overall performance to improve. This is only the case when the memory pool has been prewarmed for each stream - otherwise, sync-and-steal behavior comes in to play, forcing any already-launched work to complete prior to allocating the next buffer, therefore preventing any kernel overlap.
undesirable synchronization:
desirable kernel overlap: