-
-
Notifications
You must be signed in to change notification settings - Fork 16.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect.py supports running against a Triton container #9228
Conversation
Triton Inference Server is an open source inference serving software that streamlines AI inferencing. https://github.com/triton-inference-server/server The user can now provide a "--triton-url" argument to detect.py to use a local or remote Triton server for inference. For e.g., http://localhost:8000 will use http over port 8000 and grpc://localhost:8001 will use grpc over port 8001. Note, it is not necessary to specify a weights file to use Triton. A Triton container can be created by first exporting the Yolov5 model to a Triton supported runtime. Onnx, Torchscript, TensorRT are supported by both Triton and the export.py script. The exported model can then be containerized via the OctoML CLI. See https://github.com/octoml/octo-cli#getting-started for a guide.
@glenn-jocher , @AyushExel - here is a PR against the yolov5 repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @gaziqbal, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with
ultralytics/yolov5
master
branch. If your PR is behind you can update your code by clicking the 'Update branch' button or by runninggit pull
andgit merge master
locally. - ✅ Verify all YOLOv5 Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee
Please let me know if you need anything more here. |
@gaziqbal thanks, we should be reviewing this soon, no changes required ATM |
Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
@gaziqbal thanks for your patience. I think I'm going to try to refactor this to not treat triton backends differently. There's a tendency for new users to introduce more code than may be required for their feature as they treat it specially compared to existing features, but with 12 different inference types all using a single --weights argument I'd rather not introduce additional command line arguments and function arguments for one more. Just like --source and --weights are multi-purpose I think we can extend them to triton inference as well, I'll see what I can do here today. |
Signed-off-by: Glenn Jocher <[email protected]>
… http://... Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]>
@gaziqbal pinging you to see if you could re-test after my updates (I hope I didn't break anything)! |
for more information, see https://pre-commit.ci
…nto triton-support
@glenn-jocher - the triton server detection broke because it was using the Path.name property for matching which would strip out any http:// or grpc:// prefixes. I also needed to change the Triton server class to query the model name because the weights parameter is being used for the url. Can you please take a look again? I have verified http and grpc on my end. |
@gaziqbal understood. Is there a public server URL I could temporarily use for debugging? I see an error from Vanessa that I'm working on now. |
@gaziqbal I took a look, everything looks good to merge over here. Do your updates fix Vanessa's issue? |
@gaziqbal PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐ |
@gaziqbal @glenn-jocher I tried but in case of trition servering a series of models according to the code, it defaults to the first model not the one named "yolov5", I think add parameter model_name in TritonRemoteModel |
Good point. That's fairly straightforward to do for TritonRemoteModel. Are you invoking it via detect.py? If so, we'll need a way to relay that. |
i'm thinking there are 2 ways 1 is to add a new parameter model_name but it's a bit redundant, another way is to pass the end in "weights" like "grpc://localhost:8001/yolov5" and in TritonRemoteModel will handle it. |
My concern with the latter is that it would be a contrived URI schema and not match canonical Triton URIs which may be confusing. That said, the approach is worth exploring more. |
Stupid question here. Could we use the URL question mark structure for passing variables, i.e. something like this to allow more arguments into the triton server? grpc://localhost:8001/?model=yolov5s.pt&conf=0.25&imgsz=640 |
Hi! Where can I find any info on how exactly triton should be configured for working with this solution? I used triton with custom client. I tried to use my triton backend with detect.py and got issue: Here is my config:
|
@ArgoHA , I am having the same problem here. Were you able to solve it ?
|
@ArgoHA , I solved using this configuration:
|
This PR enables detect.py to use a Triton for inference. The Triton Inference Server (https://github.com/triton-inference-server/server) is an open source inference serving software that streamlines AI inferencing.
The user can now provide a "--triton-url" argument to detect.py to use a local or remote Triton server for inference.
For e.g., http://localhost:8000 will use http over port 8000 and grpc://localhost:8001 will use grpc over port 8001.
Note, it is not necessary to specify a weights file to detect.py when using Triton for inference.
A Triton container can be created by first exporting the Yolov5 model to a Triton supported runtime. Onnx, Torchscript, TensorRT are supported by both Triton and the export.py script.
The exported model can then be containerized via the OctoML CLI.
See https://github.com/octoml/octo-cli#getting-started for a guide.
python export.py --include onnx # exports the onnx model as yolov5.onnx
mkdir octoml && cd octoml && mv ../yolov5s.onnx . #create an octoml folder and moves the onnx model into it
octoml init && octoml package && octoml deploy
python ../detect.py --triton-url http://localhost:8000
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhancements in model handling for PyTorch, better integration with NVIDIA Triton Inference Server and tensor device handling improvements.
📊 Key Changes
🎯 Purpose & Impact
The PR potentially impacts developers looking for streamlined deployment and broadened inference capabilities, as well as users who want to access advanced model-serving features easily.