Detect.py supports running against a Triton container #9228

gaziqbal · 2022-08-30T18:20:49Z

This PR enables detect.py to use a Triton for inference. The Triton Inference Server (https://github.com/triton-inference-server/server) is an open source inference serving software that streamlines AI inferencing.

The user can now provide a "--triton-url" argument to detect.py to use a local or remote Triton server for inference.
For e.g., http://localhost:8000 will use http over port 8000 and grpc://localhost:8001 will use grpc over port 8001.
Note, it is not necessary to specify a weights file to detect.py when using Triton for inference.

A Triton container can be created by first exporting the Yolov5 model to a Triton supported runtime. Onnx, Torchscript, TensorRT are supported by both Triton and the export.py script.

The exported model can then be containerized via the OctoML CLI.
See https://github.com/octoml/octo-cli#getting-started for a guide.

python export.py --include onnx # exports the onnx model as yolov5.onnx
mkdir octoml && cd octoml && mv ../yolov5s.onnx . #create an octoml folder and moves the onnx model into it
octoml init && octoml package && octoml deploy
python ../detect.py --triton-url http://localhost:8000

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhancements in model handling for PyTorch, better integration with NVIDIA Triton Inference Server and tensor device handling improvements.

📊 Key Changes

Adjusted tensor device assignment to use model's device attribute directly.
Added support for NVIDIA Triton Inference Server URLs as model paths.
Implemented loading and inference logic for models served by Triton.
Ensured correct tensor format conversions for different model types.
Updated warmup function to include Triton models.

🎯 Purpose & Impact

🎛️ Provides more intuitive device handling for tensor operations, reducing potential for device mismatch errors.
🚀 Extends the model serving capabilities by integrating support for NVIDIA Triton Inference Server, enabling efficient deployment and scaling of AI models.
☁️ Facilitates remote model inference, allowing users to leverage models hosted on servers without needing to run them locally.
🔧 Enhances model compatibility across different formats, promoting a smoother user experience in applying various models.
🌡️ Improves warmup process by including remote Triton models, ensuring they are ready for efficient inference.

The PR potentially impacts developers looking for streamlined deployment and broadened inference capabilities, as well as users who want to access advanced model-serving features easily.

Triton Inference Server is an open source inference serving software that streamlines AI inferencing. https://github.com/triton-inference-server/server The user can now provide a "--triton-url" argument to detect.py to use a local or remote Triton server for inference. For e.g., http://localhost:8000 will use http over port 8000 and grpc://localhost:8001 will use grpc over port 8001. Note, it is not necessary to specify a weights file to use Triton. A Triton container can be created by first exporting the Yolov5 model to a Triton supported runtime. Onnx, Torchscript, TensorRT are supported by both Triton and the export.py script. The exported model can then be containerized via the OctoML CLI. See https://github.com/octoml/octo-cli#getting-started for a guide.

gaziqbal · 2022-08-30T18:21:47Z

@glenn-jocher , @AyushExel - here is a PR against the yolov5 repo.

github-actions

👋 Hello @gaziqbal, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with ultralytics/yolov5 master branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by running git pull and git merge master locally.
✅ Verify all YOLOv5 Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

gaziqbal · 2022-09-07T17:38:41Z

@glenn-jocher , @AyushExel - here is a PR against the yolov5 repo.

Please let me know if you need anything more here.

glenn-jocher · 2022-09-10T15:52:24Z

@gaziqbal thanks, we should be reviewing this soon, no changes required ATM

Signed-off-by: Glenn Jocher <[email protected]>

glenn-jocher · 2022-09-21T12:37:29Z

@gaziqbal thanks for your patience.

I think I'm going to try to refactor this to not treat triton backends differently. There's a tendency for new users to introduce more code than may be required for their feature as they treat it specially compared to existing features, but with 12 different inference types all using a single --weights argument I'd rather not introduce additional command line arguments and function arguments for one more.

Just like --source and --weights are multi-purpose I think we can extend them to triton inference as well, I'll see what I can do here today.

Signed-off-by: Glenn Jocher <[email protected]>

… http://... Signed-off-by: Glenn Jocher <[email protected]>

Signed-off-by: Glenn Jocher <[email protected]>

glenn-jocher · 2022-09-22T22:11:36Z

@gaziqbal pinging you to see if you could re-test after my updates (I hope I didn't break anything)!

for more information, see https://pre-commit.ci

…nto triton-support

gaziqbal · 2022-09-23T16:35:19Z

@glenn-jocher - the triton server detection broke because it was using the Path.name property for matching which would strip out any http:// or grpc:// prefixes. I also needed to change the Triton server class to query the model name because the weights parameter is being used for the url. Can you please take a look again? I have verified http and grpc on my end.

glenn-jocher · 2022-09-23T21:21:24Z

@gaziqbal understood. Is there a public server URL I could temporarily use for debugging? I see an error from Vanessa that I'm working on now.

for more information, see https://pre-commit.ci

glenn-jocher · 2022-09-23T21:42:00Z

@gaziqbal I took a look, everything looks good to merge over here. Do your updates fix Vanessa's issue?

glenn-jocher · 2022-09-23T22:57:00Z

@gaziqbal PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

kingkong135 · 2022-10-04T02:37:27Z

@gaziqbal @glenn-jocher I tried but in case of trition servering a series of models according to the code, it defaults to the first model not the one named "yolov5", I think add parameter model_name in TritonRemoteModel

gaziqbal · 2022-10-04T02:39:57Z

Good point. That's fairly straightforward to do for TritonRemoteModel. Are you invoking it via detect.py? If so, we'll need a way to relay that.

kingkong135 · 2022-10-04T02:46:35Z

i'm thinking there are 2 ways 1 is to add a new parameter model_name but it's a bit redundant, another way is to pass the end in "weights" like "grpc://localhost:8001/yolov5" and in TritonRemoteModel will handle it.

gaziqbal · 2022-10-04T02:51:36Z

My concern with the latter is that it would be a contrived URI schema and not match canonical Triton URIs which may be confusing. That said, the approach is worth exploring more.

glenn-jocher · 2022-10-04T20:56:09Z

Stupid question here. Could we use the URL question mark structure for passing variables, i.e. something like this to allow more arguments into the triton server?

grpc://localhost:8001/?model=yolov5s.pt&conf=0.25&imgsz=640

ArgoHA · 2023-01-08T11:14:23Z

Hi! Where can I find any info on how exactly triton should be configured for working with this solution? I used triton with custom client. I tried to use my triton backend with detect.py and got issue:
tritonclient.utils.InferenceServerException: got unexpected numpy array shape [1, 3, 640, 640], expected [-1, 3, 640, 640]

Here is my config:

name: "yolov5"
platform: "tensorrt_plan"
max_batch_size: 1
input [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [ 3, 640, 640 ]
  }
]
output [
  {
    name: "output0"
    data_type: TYPE_FP32
    dims: [ 25200, 85 ]
  }
]

fabito · 2023-01-18T00:22:32Z

@ArgoHA , I am having the same problem here. Were you able to solve it ?

Traceback (most recent call last):
  File "detect.py", line 259, in <module>
    main(opt)
  File "detect.py", line 254, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "detect.py", line 113, in run
    model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
  File "/usr/src/app/models/common.py", line 597, in warmup
    self.forward(im)  # warmup
  File "/usr/src/app/models/common.py", line 558, in forward
    y = self.model(im)
  File "/usr/src/app/utils/triton.py", line 60, in __call__
    inputs = self._create_inputs(*args, **kwargs)
  File "/usr/src/app/utils/triton.py", line 80, in _create_inputs
    input.set_data_from_numpy(value.cpu().numpy())
  File "/opt/conda/lib/python3.8/site-packages/tritonclient/grpc/__init__.py", line 1831, in set_data_from_numpy
    raise_error(
  File "/opt/conda/lib/python3.8/site-packages/tritonclient/utils/__init__.py", line 35, in raise_error
    raise InferenceServerException(msg=msg) from None
tritonclient.utils.InferenceServerException: got unexpected numpy array shape [1, 3, 640, 640], expected [-1, 3, 640, 640]

fabito · 2023-01-18T01:25:51Z

@ArgoHA ,

I solved using this configuration:

name: "yolov5"
platform: "tensorrt_plan"
max_batch_size: 0
input [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [1, 3, 640, 640 ]
  }
]
output [
  {
    name: "output0"
    data_type: TYPE_FP32
    dims: [1, 25200, 85 ]
  }
]

glenn-jocher and others added 6 commits August 24, 2022 23:29

update coco128-seg comments

92a3ff0

Merge branch 'master' of https://github.com/gaziqbal/yolov5

a3f9e99

added triton client to requirements

f27d974

fixed support for TFSavedModels in Triton

e764d00

reverted change

4ef650b

github-actions bot reviewed Aug 30, 2022

View reviewed changes

glenn-jocher added 3 commits September 4, 2022 16:38

Merge branch 'master' into triton-support

a665a85

Merge branch 'master' into triton-support

de0073f

Merge branch 'master' into triton-support

7e8182d

Merge branch 'master' into triton-support

304bcbd

Signed-off-by: Glenn Jocher <[email protected]>

glenn-jocher self-assigned this Sep 18, 2022

Merge branch 'master' into triton-support

8c48452

glenn-jocher added enhancement New feature or request TODO High priority items labels Sep 18, 2022

glenn-jocher added 3 commits September 18, 2022 23:19

Test CoreML update

4f59fcb

Signed-off-by: Glenn Jocher <[email protected]>

Merge branch 'master' into triton-support

9c78738

Update ci-testing.yml

9fd00c5

Signed-off-by: Glenn Jocher <[email protected]>

glenn-jocher added 6 commits September 21, 2022 18:24

Use pathlib

8407641

Signed-off-by: Glenn Jocher <[email protected]>

Refacto DetectMultiBackend to directly accept triton url as --weights…

1227689

… http://... Signed-off-by: Glenn Jocher <[email protected]>

Deploy category

e5a4c0a

Signed-off-by: Glenn Jocher <[email protected]>

Update detect.py

97b15c7

Signed-off-by: Glenn Jocher <[email protected]>

Update common.py

43dce21

Signed-off-by: Glenn Jocher <[email protected]>

Update common.py

2def487

Signed-off-by: Glenn Jocher <[email protected]>

glenn-jocher marked this pull request as ready for review September 21, 2022 17:10

Merge branch 'master' into triton-support

9dd71a6

Merge branch 'master' into triton-support

51bfe4c

Signed-off-by: Glenn Jocher <[email protected]>

Gaz Iqbal and others added 4 commits September 23, 2022 16:28

triton fixes

c0ad66f

[pre-commit.ci] auto fixes from pre-commit.com hooks

27d8b4d

for more information, see https://pre-commit.ci

fixed triton model query over grpc

80ad40d

Merge branch 'triton-support' of https://github.com/gaziqbal/yolov5 i…

f353125

…nto triton-support

Merge branch 'master' into triton-support

9e8583d

glenn-jocher and others added 3 commits September 23, 2022 23:33

Update check_requirements('tritonclient[all]')

fe3f304

group imports

ec7dfda

[pre-commit.ci] auto fixes from pre-commit.com hooks

7acb42c

for more information, see https://pre-commit.ci

glenn-jocher added 4 commits September 24, 2022 00:30

Fix likely remote URL bug

bc56b7c

update comment

c52c4f8

Update is_url()

1cd9880

Fix 2x download attempt on http://path/to/model.pt

f0cb782

glenn-jocher removed the TODO High priority items label Sep 23, 2022

glenn-jocher merged commit d669a74 into ultralytics:master Sep 23, 2022

Hojland mentioned this pull request Oct 17, 2022

feat/bump Go-Autonomous/yolov5#15

Merged

mltoml mentioned this pull request Apr 2, 2023

Nvidia Triton Inference Server ultralytics/ultralytics#733

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect.py supports running against a Triton container #9228

Detect.py supports running against a Triton container #9228

gaziqbal commented Aug 30, 2022 •

edited by UltralyticsAssistant

Loading

gaziqbal commented Aug 30, 2022

github-actions bot left a comment

gaziqbal commented Sep 7, 2022

glenn-jocher commented Sep 10, 2022

glenn-jocher commented Sep 21, 2022 •

edited

Loading

glenn-jocher commented Sep 22, 2022

gaziqbal commented Sep 23, 2022

glenn-jocher commented Sep 23, 2022

glenn-jocher commented Sep 23, 2022 •

edited

Loading

glenn-jocher commented Sep 23, 2022

kingkong135 commented Oct 4, 2022

gaziqbal commented Oct 4, 2022

kingkong135 commented Oct 4, 2022

gaziqbal commented Oct 4, 2022

glenn-jocher commented Oct 4, 2022

ArgoHA commented Jan 8, 2023

fabito commented Jan 18, 2023 •

edited

Loading

fabito commented Jan 18, 2023

Detect.py supports running against a Triton container #9228

Detect.py supports running against a Triton container #9228

Conversation

gaziqbal commented Aug 30, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

gaziqbal commented Aug 30, 2022

github-actions bot left a comment

Choose a reason for hiding this comment

gaziqbal commented Sep 7, 2022

glenn-jocher commented Sep 10, 2022

glenn-jocher commented Sep 21, 2022 • edited Loading

glenn-jocher commented Sep 22, 2022

gaziqbal commented Sep 23, 2022

glenn-jocher commented Sep 23, 2022

glenn-jocher commented Sep 23, 2022 • edited Loading

glenn-jocher commented Sep 23, 2022

kingkong135 commented Oct 4, 2022

gaziqbal commented Oct 4, 2022

kingkong135 commented Oct 4, 2022

gaziqbal commented Oct 4, 2022

glenn-jocher commented Oct 4, 2022

ArgoHA commented Jan 8, 2023

fabito commented Jan 18, 2023 • edited Loading

fabito commented Jan 18, 2023

gaziqbal commented Aug 30, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Sep 21, 2022 •

edited

Loading

glenn-jocher commented Sep 23, 2022 •

edited

Loading

fabito commented Jan 18, 2023 •

edited

Loading