Replies: 3 comments
-
I'm doubting that this package will give you this much control. You could try onnxruntime API, which is lower level. |
Beta Was this translation helpful? Give feedback.
-
You can use ONNX Runtime's execution provider options with ONNX Runtime GenAI. You can add them in the The CPU EP's options are available via ONNX Runtime's
For other EPs such as CUDA or DirectML, their options are available via ONNX Runtime's
You can also set the provider options at runtime. |
Beta Was this translation helpful? Give feedback.
-
What I mean when I say onnxruntime API gives you more control is you can mess around with the inputs and weights and decide when they go on the GPU and RAM etc. |
Beta Was this translation helpful? Give feedback.
-
Env
Question
Is there a way to limit the amount of GPU resources used by Onnxruntime when running sLM models in Onnxruntime-genai?
For example, I’d like to know how to restrict the sLM model to use only up to 20% of GPU utilization.
I’m looking for such a setting to minimize the impact on other programs when running alongside them.
Beta Was this translation helpful? Give feedback.
All reactions