warm up issue #1810

loveppdog · 2020-07-20T10:02:34Z

I test warm up in triton 20.03 with my models "tensorflow_savedmodel" to solve oom issues.
I have some tests like this:(models loading successfully for all tests)

warm up model A with instance_group 1. After trt start-up, GPU memory usage shows:3200MB。When one inference is doing, GPU memory is still 3200MB.
warm up model A with imstance_group 2. After trt start-up, GPU just shows:3209MB. （I think it should be 6000MB+, why not？ If two inferences are simultaneous, does gpu memory change to 6000MB+? ）
warm up models A,B,C,...F with diffenrent instance_group . After trt start-up, GPU just shows:9000MB. (When inferences, I find GPU becomes larger by 11000MB and it will be OOM. I have warmed up, why gpu memory still changes ?)

note: our models input/output shape are fixed.

instance_group [
{
count: 1
kind: KIND_GPU
}
]
model_warmup {
batch_size: 1
inputs {
key: "input_1"
value {
data_type: TYPE_FP32
dims: [1, 128, 128, 128 ]
random_data: true
}

deadeyegoodwin · 2020-07-20T23:17:54Z

When you say "trt" do you mean "triton"? trt is TensorRT which is a different thing than Triton and so it is confusing.

Warming up models will not necessarily fix OOM issues. Please see #1507 for a discussion about how frameworks allocate some memory at load time and then additional memory at inference time. Tensorflow framework dynamically allocates memory at inference time.

deadeyegoodwin closed this as completed Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warm up issue #1810

warm up issue #1810

loveppdog commented Jul 20, 2020 •

edited

Loading

deadeyegoodwin commented Jul 20, 2020

warm up issue #1810

warm up issue #1810

Comments

loveppdog commented Jul 20, 2020 • edited Loading

deadeyegoodwin commented Jul 20, 2020

loveppdog commented Jul 20, 2020 •

edited

Loading