-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensemble model question and model priority #1194
Comments
Note that setting the CUDA stream priority doesn't really do that much. In 20.03 we added some additional prioritization options to the dynamic batch scheduler... they only prioritize within a model, not across models but may be interesting to you: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/docs/model_configuration.rst#dynamic-batcher |
How does pre-model‘s output tensor of an ensemble trans to post-model's input tensor in different GPUs. |
A peer-to-peer copy should be performed if the GPUs support it. Otherwise the tensor will have to stage through CPU memory. Are you seeing different behavior? |
Sorry for invading this old issue but is there currently any way how to prioritize computation (not just streams) across models? Let's say that emptying some model queues should be prioritized over dealing with requests from some other model queues? |
Rate limiter is being worked on: #1507 (comment) |
1、ensemble model question:
An ensemble model represents a pipeline of one or more models. Can the models be
distribution in different gpus.
2、model priority
TRTIS can set model priority. Mode priority is equal to CUDA stream priority.
Why model priority only work in TensorRT model.
The text was updated successfully, but these errors were encountered: