-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triton terminated with Signal (6) #4566
Comments
Hi @erichtho, Thanks for reporting this issue.
CC @GuanLuo @tanmayv25 if you've seen any TRT or similar backend issues like this before |
Sorry, I can't share my code, it's a part of big project. I'm try to simplify it, but can't reproduce the error with simplified code(still trying). And http client get broken pipe, so I can't tell if it would cause the bug. |
The back trace suggests that the error originates in within TensorRT. I don't think the issue is client specific.
I assume the issue only occurs in case of sufficient request concurrency? What is your instance count? Can you share your model configuration file? |
Yes, it's related to request concurrency. And I feel like it appear with higher opportunity when there are lots of request with almost maximum shape.
There are two other models in model repository, total instance count is 3. By the way, we tried triton and onnx model. It's normal. |
TensorRT team seems to have a fix that can resolve this issue. We are working with them to make the fix available to Triton users. |
When using triton grpc client to infer, triton will exit unexpectedly sometimes.
like using:
and tritonserver output:
terminate called after throwing an instance of 'nvinfer1::InternalError'
what(): Assertion mUsedAllocators.find(alloc) != mUsedAllocators.end() && "Myelin free callback called with invalid MyelinAllocator" failed.
Signal (6) received.
0# 0x00005602FC4F21B9 in tritonserver
1# 0x00007FC98736C0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
4# 0x00007FC987725911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
5# 0x00007FC98773138C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007FC987730369 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
7# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
8# 0x00007FC98752BBEF in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
9# _Unwind_RaiseException in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
10# __cxa_throw in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
11# nvinfer1::Lobbernvinfer1::InternalError::operator()(char const*, char const*, int, int, nvinfer1::ErrorCode, char const*) in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
12# 0x00007FC9020EECBC in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
13# 0x00007FC902A7220F in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
14# 0x00007FC902A2862D in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
15# 0x00007FC902A7F653 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
16# 0x00007FC9020EE715 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
17# 0x00007FC901C8BAD0 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
18# 0x00007FC9020F41F4 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
19# 0x00007FC902913FD8 in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
20# 0x00007FC90291478C in /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
21# 0x00007FC97A57C6D7 in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
22# 0x00007FC97A5855FE in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
23# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
24# 0x00007FC987C1D73A in /opt/tritonserver/bin/../lib/libtritonserver.so
25# 0x00007FC987C1E0F7 in /opt/tritonserver/bin/../lib/libtritonserver.so
26# 0x00007FC987CDB411 in /opt/tritonserver/bin/../lib/libtritonserver.so
27# 0x00007FC987C175C7 in /opt/tritonserver/bin/../lib/libtritonserver.so
28# 0x00007FC98775DDE4 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
29# 0x00007FC98896D609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
30# clone in /usr/lib/x86_64-linux-gnu/libc.so.6
tritonserver version:22.05-py3(docker image)
using tensorrt backend.
os: ubuntu 20.04
How To Reproduce
We use trtexec to transform a onnx model to tensorRT engine(with maxShapes=1x80x12000), then put into triton model repository.
When send dozens of request with shape 1x80x11000(like 8000) and other model requests in same time(different grpc client in different process, not multiprocessing, but multiple .py running),triton will exit by chance.
The text was updated successfully, but these errors were encountered: