which gpu will instance model exist if not set gpus:[0] #1609

loveppdog · 2020-06-04T08:56:39Z

Our pipeline will run many models or multiple identical models concurrently.
When trtserver starts up, I use default the model control mode poll and our model respository has lots of models.
If we only use one GPU, when inference, it will be OOM sometimes.
If we use two GPUs like the following and set difference gpus:[] for different models, it can work OK.

export CUDA_VISIBLE_DEVICES=6,7
#/bin/bash
trtserver --model-store=/data/ --grpc-infer-thread-count=16 --grpc-stream-infer-thread-count=16

instance_group [
{
count: 8
kind: KIND_GPU
gpus: [ 1 ]
}
]

My question is:
If I don't set gpus:[], which gpu will the instance models exist?
If there is not enough GPU memory, which models will unload?
When trt server starts up, loads all models?(if our model respository large)?

GuanLuo · 2020-06-04T17:29:46Z

If instance group is not set, the instances will be deployed on each available GPU. In your example, there will be 8 instances on GPU 0 and 8 instances on GPU 1.
Model will not be unloaded automatically in case of not enough GPU memory, you may want to use "explicit" model control mode to control model load / unload on demand.
Triton will load all models in the model repository in model control model "poll" and "none". "explicit" model will only load models specified in "--load-model"

loveppdog · 2020-06-05T08:16:55Z

If instance group is not set, the instances will be deployed on each available GPU. In your example, there will be 8 instances on GPU 0 and 8 instances on GPU 1.

Model will not be unloaded automatically in case of not enough GPU memory, you may want to use "explicit" model control mode to control model load / unload on demand.

Triton will load all models in the model repository in model control model "poll" and "none". "explicit" model will only load models specified in "--load-model"

1.Triton will load all models in the model repository in model control model "poll" and "none".
Question:
If the model repository has lots of models, loading all models are not enough gpu memory.
What will happen when server start up or when inferences?
2. When trt server starts up successfully, but I found OOM happened sometimes when inference.
How should I avoid this issue? Can you give me more solutions to guide inference client?

deadeyegoodwin · 2020-06-05T16:29:58Z

If all models cannot be loaded then the inference server will exit, unless you use the --exit-on-error=false flag and then the server will continue running but all the models will not be loaded/available.

See #1507, #1440, #1499

deadeyegoodwin closed this as completed Jun 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

which gpu will instance model exist if not set gpus:[0] #1609

which gpu will instance model exist if not set gpus:[0] #1609

loveppdog commented Jun 4, 2020

GuanLuo commented Jun 4, 2020

loveppdog commented Jun 5, 2020

deadeyegoodwin commented Jun 5, 2020

which gpu will instance model exist if not set gpus:[0] #1609

which gpu will instance model exist if not set gpus:[0] #1609

Comments

loveppdog commented Jun 4, 2020

GuanLuo commented Jun 4, 2020

loveppdog commented Jun 5, 2020

deadeyegoodwin commented Jun 5, 2020