Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA Concurrency Mechanism #292

Open
SoroushHeidari opened this issue Mar 28, 2024 · 2 comments
Open

NVIDIA Concurrency Mechanism #292

SoroushHeidari opened this issue Mar 28, 2024 · 2 comments

Comments

@SoroushHeidari
Copy link

Thank you for your contribution. I wanted to ask if you have any methods to simulate concurrent execution on GPU. NVIDIA provides three concurrency mechanisms to support concurrent applications: priority streams, time-slicing, and multi-process server (MPS). Is there a way to emulate the concurrency behavior for a simulator? If not, do you any suggestion to approach this problem?

@JRPan
Copy link
Collaborator

JRPan commented Mar 29, 2024

  • priority streams: We support multiple streams. You need to enable this by adding -gpgpu_concurrent_kernel_sm 1 to your config file. But your kernels must be small enough. If kernels are just big to fill all SMs then only that kernel will run. You can change the behavior by changing select_kernel.
  • time-slicing: We don't support this. You would have to model context switching, which writes all register values back to memory.
  • MPS: We don't support this but this is easy to support. Change the select_kernel function to issue one kernel to only a subset of SMs.

You also probably want to checkout https://github.com/accel-sim/accel-sim-framework/tree/dev-stream-stats. By default, all stats are aggregated which does not make sense if you have concurrency. This branch changed that and stats are collected per-stream. This needs to be paired with this branch of gpgpu-sim. https://github.com/accel-sim/gpgpu-sim_distribution/tree/stream-stats

@SoroushHeidari
Copy link
Author

Thank you for getting back to me quickly! I have a question regarding simulating DNN inference and training using Accel-sim. I would like to exclude the first few initial iterations, commonly referred to as "warm-up" iterations, from the final stats report. Is there a way to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants