[Metrics] | Get per-model GPU Utilization and Memory metrics #6148

nikhil-sk · 2023-08-04T22:14:53Z

Is your feature request related to a problem? Please describe.

Currently, Triton does not publish GPU utilization and GPU Memory metrics at a model-level granularity.
Understandably, this maybe difficult to gauge due to multiple models being loaded on a single GPU, and due to nature of inference, this memory allocation may dynamically change.
However, I'm creating this issue to check whether any long-term solution is possible? Perhaps it is possible to maintain a running average of the GPU utilization of a given model and report that as avg utilization?
What blockers do currently exist in order to tackle this?
Thank you.

dyastremsky · 2023-08-07T18:53:29Z

@GuanLuo added per-model GPU memory usage in this PR, which should be available from 23.06 onwards for TensorRT and ONNX Runtime models. This provides estimated memory usage at load time.

I don't think GPU utilization would be possible, given it is not additive (i.e. a model using 20% GPU in isolation and another model using 50% GPU utilization in isolation will not necessarily use 70% of GPU if running at the same time). I suspect there would be similar issues with trying to get runtime GPU usage with multiple models potentially running, plus there would be the overhead of querying this information repeatedly. Guan could probably provide more context given he implemented per-model GPU usage metrics.

krishung5 · 2023-08-22T22:27:03Z

Closing due to inactivity. Please let us know if you would like to reopen the issue for follow-up.

dyastremsky added the question Further information is requested label Aug 7, 2023

krishung5 closed this as completed Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics] | Get per-model GPU Utilization and Memory metrics #6148

[Metrics] | Get per-model GPU Utilization and Memory metrics #6148

nikhil-sk commented Aug 4, 2023

dyastremsky commented Aug 7, 2023 •

edited

Loading

krishung5 commented Aug 22, 2023

[Metrics] | Get per-model GPU Utilization and Memory metrics #6148

[Metrics] | Get per-model GPU Utilization and Memory metrics #6148

Comments

nikhil-sk commented Aug 4, 2023

dyastremsky commented Aug 7, 2023 • edited Loading

krishung5 commented Aug 22, 2023

dyastremsky commented Aug 7, 2023 •

edited

Loading