ensemble model question and model priority #1194

mengkai94 · 2020-03-17T05:49:50Z

1、ensemble model question：
An ensemble model represents a pipeline of one or more models. Can the models be
distribution in different gpus.

2、model priority
TRTIS can set model priority. Mode priority is equal to CUDA stream priority.
Why model priority only work in TensorRT model.

deadeyegoodwin · 2020-03-17T19:07:01Z

Yes, you can place a model on specific GPU(s) using the instance_group model configuration options: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#instance-groups
This controls where the model runs (when part of an ensemble or not... doesn't matter).
We can only set stream priority when we are able to create the CUDA stream ourselves. As far as we known, the other model frameworks manage the CUDA streams themselves. Please let us know if you have some insight into how to do this for a particular framework.

Note that setting the CUDA stream priority doesn't really do that much. In 20.03 we added some additional prioritization options to the dynamic batch scheduler... they only prioritize within a model, not across models but may be interesting to you: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/docs/model_configuration.rst#dynamic-batcher

mengkai94 · 2020-03-18T08:45:31Z

How does pre-model‘s output tensor of an ensemble trans to post-model's input tensor in different GPUs.
the output tensor copy to system memory and then as input tensor？

deadeyegoodwin · 2020-03-19T20:55:52Z

A peer-to-peer copy should be performed if the GPUs support it. Otherwise the tensor will have to stage through CPU memory. Are you seeing different behavior?

mys007 · 2020-12-10T14:38:05Z

Sorry for invading this old issue but is there currently any way how to prioritize computation (not just streams) across models? Let's say that emptying some model queues should be prioritized over dealing with requests from some other model queues?

deadeyegoodwin · 2020-12-10T18:48:42Z

Rate limiter is being worked on: #1507 (comment)

deadeyegoodwin closed this as completed Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensemble model question and model priority #1194

ensemble model question and model priority #1194

mengkai94 commented Mar 17, 2020

deadeyegoodwin commented Mar 17, 2020

mengkai94 commented Mar 18, 2020

deadeyegoodwin commented Mar 19, 2020

mys007 commented Dec 10, 2020 •

edited

Loading

deadeyegoodwin commented Dec 10, 2020

ensemble model question and model priority #1194

ensemble model question and model priority #1194

Comments

mengkai94 commented Mar 17, 2020

deadeyegoodwin commented Mar 17, 2020

mengkai94 commented Mar 18, 2020

deadeyegoodwin commented Mar 19, 2020

mys007 commented Dec 10, 2020 • edited Loading

deadeyegoodwin commented Dec 10, 2020

mys007 commented Dec 10, 2020 •

edited

Loading