Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which gpu will instance model exist if not set gpus:[0] #1609

Closed
loveppdog opened this issue Jun 4, 2020 · 3 comments
Closed

which gpu will instance model exist if not set gpus:[0] #1609

loveppdog opened this issue Jun 4, 2020 · 3 comments

Comments

@loveppdog
Copy link

Our pipeline will run many models or multiple identical models concurrently.
When trtserver starts up, I use default the model control mode poll and our model respository has lots of models.
If we only use one GPU, when inference, it will be OOM sometimes.
If we use two GPUs like the following and set difference gpus:[] for different models, it can work OK.

export CUDA_VISIBLE_DEVICES=6,7
#/bin/bash
trtserver --model-store=/data/ --grpc-infer-thread-count=16 --grpc-stream-infer-thread-count=16

instance_group [
{
count: 8
kind: KIND_GPU
gpus: [ 1 ]
}
]

My question is:
If I don't set gpus:[], which gpu will the instance models exist?
If there is not enough GPU memory, which models will unload?
When trt server starts up, loads all models?(if our model respository large)?

@GuanLuo
Copy link
Contributor

GuanLuo commented Jun 4, 2020

  • If instance group is not set, the instances will be deployed on each available GPU. In your example, there will be 8 instances on GPU 0 and 8 instances on GPU 1.

  • Model will not be unloaded automatically in case of not enough GPU memory, you may want to use "explicit" model control mode to control model load / unload on demand.

  • Triton will load all models in the model repository in model control model "poll" and "none". "explicit" model will only load models specified in "--load-model"

@loveppdog
Copy link
Author

If instance group is not set, the instances will be deployed on each available GPU. In your example, there will be 8 instances on GPU 0 and 8 instances on GPU 1.

Model will not be unloaded automatically in case of not enough GPU memory, you may want to use "explicit" model control mode to control model load / unload on demand.

Triton will load all models in the model repository in model control model "poll" and "none". "explicit" model will only load models specified in "--load-model"

1.Triton will load all models in the model repository in model control model "poll" and "none".
Question:
If the model repository has lots of models, loading all models are not enough gpu memory.
What will happen when server start up or when inferences?
2. When trt server starts up successfully, but I found OOM happened sometimes when inference.
How should I avoid this issue? Can you give me more solutions to guide inference client?

@deadeyegoodwin
Copy link
Contributor

If all models cannot be loaded then the inference server will exit, unless you use the --exit-on-error=false flag and then the server will continue running but all the models will not be loaded/available.

See #1507, #1440, #1499

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants