You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I test warm up in triton 20.03 with my models "tensorflow_savedmodel" to solve oom issues.
I have some tests like this:(models loading successfully for all tests)
warm up model A with instance_group 1. After trt start-up, GPU memory usage shows:3200MB。When one inference is doing, GPU memory is still 3200MB.
warm up model A with imstance_group 2. After trt start-up, GPU just shows:3209MB. (I think it should be 6000MB+, why not? If two inferences are simultaneous, does gpu memory change to 6000MB+? )
warm up models A,B,C,...F with diffenrent instance_group . After trt start-up, GPU just shows:9000MB. (When inferences, I find GPU becomes larger by 11000MB and it will be OOM. I have warmed up, why gpu memory still changes ?)
When you say "trt" do you mean "triton"? trt is TensorRT which is a different thing than Triton and so it is confusing.
Warming up models will not necessarily fix OOM issues. Please see #1507 for a discussion about how frameworks allocate some memory at load time and then additional memory at inference time. Tensorflow framework dynamically allocates memory at inference time.
I test warm up in triton 20.03 with my models "tensorflow_savedmodel" to solve oom issues.
I have some tests like this:(models loading successfully for all tests)
note: our models input/output shape are fixed.
instance_group [
{
count: 1
kind: KIND_GPU
}
]
model_warmup {
batch_size: 1
inputs {
key: "input_1"
value {
data_type: TYPE_FP32
dims: [1, 128, 128, 128 ]
random_data: true
}
The text was updated successfully, but these errors were encountered: