-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion bound >= 0 failed of TensorRT 8.6.1 when running build_serialized_network on GPU nvidia tesla v100 #3639
Comments
What does that error mean? How can I debug this? |
Does it work with onnxruntime? you can check it quickly with |
Of course, It works with onnxruntime and polygraphy. Polygraphy output
|
could you please provide a reproduce? Thanks! |
Would be great if you can try TRT 9.2/9.3 first. |
Is there python wheel with trt 9.2/9.3 or I need |
python wheel should be shipped with the tar package. |
I couldn't find wheel in tar package of current repo. But I found in such archives https://developer.nvidia.com/nvidia-tensorrt-8x-download , but there is also version 8.6.1 I uploaded onnx model to reproduce https://drive.google.com/file/d/1nlXTliLV9M7_Z1xiQnUXYP_p8UqbEUBk/view?usp=sharing |
And use such code # %%
import tensorrt as trt
import onnx
logger = trt.Logger(trt.Logger.VERBOSE)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
# %%
success = parser.parse_from_file('generator.onnx')
for idx in range(parser.num_errors):
err = parser.get_error(idx)
print(err)
if not success:
exit(0)
# %%
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1024 * 1024 * 1024)
config.flags |= 1 << int(trt.BuilderFlag.DEBUG)
config.clear_flag(trt.BuilderFlag.TF32)
MIN_TIME_AXIS = 1
MAX_TIME_AXIS = 400
MIN_TIME_AXIS_BERT = 1
MAX_TIME_AXIS_BERT = 50
# test input
TEST_TIME_AXIS = 400
TEST_TIME_AXIS_BERT = 50
TEXT_EMB_SIZE = 192
N_Q_FEATURES = 5
BERT_EMB_DIM = 768
dynamic_shape_config = [
{"input": "text_emb", "min": (1, MIN_TIME_AXIS, TEXT_EMB_SIZE), "opt": (1, MAX_TIME_AXIS, TEXT_EMB_SIZE), "max": (1, MAX_TIME_AXIS, TEXT_EMB_SIZE)},
{"input": "q_labels", "min": (1, MIN_TIME_AXIS, N_Q_FEATURES), "opt": (1, MAX_TIME_AXIS, N_Q_FEATURES), "max": (1, MAX_TIME_AXIS, N_Q_FEATURES)},
{"input": "bert_emb", "min": (1, MIN_TIME_AXIS_BERT, BERT_EMB_DIM), "opt": (1, MAX_TIME_AXIS_BERT, BERT_EMB_DIM), "max": (1, MAX_TIME_AXIS_BERT, BERT_EMB_DIM)},
{"input": 'speaker_ids', "min": (1,), "opt": (1,), "max": (1,)},
{"input": 'noise_scale', "min": (1,), "opt": (1,), "max": (1,)},
{"input": 'noise_scale_w', "min": (1,), "opt": (1,), "max": (1,)},
{"input": 'length_scale', "min": (1, MIN_TIME_AXIS,), "opt": (1, MAX_TIME_AXIS,), "max": (1, MAX_TIME_AXIS,)},
]
profile = builder.create_optimization_profile()
for s in dynamic_shape_config:
profile.set_shape(**s)
config.add_optimization_profile(profile)
# config.builder_optimization_level = 0
ser_engine = builder.build_serialized_network(network, config)
with open('generator.trt', 'wb') as f:
f.write(ser_engine)
|
I found that the error is due to this line https://github.com/jaywalnut310/vits/blob/main/models.py#L517. |
I also tried |
Test with TRT 9.2:
Looks like we hit a known limitation, what is the real input shape? |
I run with such command |
I saw somewhere about RandomNormalLike, but as I remember solution was just update tensorrt |
Any updates?
|
Filed internal bug 4535894 for this. |
Just an aside: I noticed the network is using what TensorRT calls "zero as placeholder", which indicates the original ONNX file is not setting the attribute "allowzero=1" for Reshape. When "allowzero=1" is not present, ONNX treats a 0 in a reshape dimension not as a dimension, but as a placeholder for the corresponding input dimension. With dynamic shapes this is almost never what the author intended, and tends to break networks. Attached is a zip file with a python script that I sometimes use to repair networks where the author did not intend 0 to be a placeholder. |
It doesn't help. I got the same error
|
May be it will help: |
There is an error in TensorRT that affects attempts to use The The following hack might work. When the output from So at the TensorRT level, the replacement for the
where the first IShuffleLayer does a 3D to 4D reshape and the second IShuffleLayer does a 4D to 3D reshape. E.g., first shuffle can reshape from Of course what I've described is at the TensorRT level. You're probably more interested in an ONNX-level description. At the ONNX level, the hack looks like replacing RandomNormalLike
|
RandomNormalLike was from here https://github.com/jaywalnut310/vits/blob/main/models.py#L90 z = torch.randn(x.size(0), 2, x.size(2)).to(device=x.device, dtype=x.dtype) # (b, 2, t)
z = z.unsqueeze(1) # (b, 1, 2, t)
z = F.conv2d(z, z.new_ones(1, 1, 1, 1)) # identity
z = z[:, 0] # (b, 2, t)
z = z * noise_scale And it seems to work. Will such a solution be added inside tensorrt? I'm testing now, if another errors will occur I let you know |
closing since there is WAR, thanks all! |
I ran into the same issue today. @zerollzeng, thanks a lot for the WAR. Any instructions how to add that |
Any advice? Thanks a lot. @zerollzeng |
ok, guys, after some debugging operations, I found that the error was caused by the following line of code: |
I am still blocked by this issue and wonder if anyone can help me out. I do need batch inference to achieve better performance. Regarding the WAR mentioned above, I can see the Cast node is already there (see below) and I got the same error when converting ONNX to TensorRT. Any suggestion would be highly appreciated. |
@zerollzeng Possible to reopen this ticket so that we can take a closer look at it? Thanks. |
I added a Cast node as suggested in the WAR as shown below and then encountered the same error in the Clip node. Appreciate your help, @zerollzeng. |
I resolved the |
Description
I try to convert small modification of VITS model https://github.com/jaywalnut310/vits. But getting error when running
builder.build_serialized_network
:Environment
TensorRT Version: 8.6.1
NVIDIA GPU: Nvidia Tesla v100
NVIDIA Driver Version: 450.216.04
CUDA Version: 11.6
CUDNN Version: 8.9
Operating System: Ubuntu 22.04.3 inside Docker Container
Python Version (if applicable): 3.11
PyTorch Version (if applicable): 1.13.1
Steps To Reproduce
Have you tried the latest release?: yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
): yesThe text was updated successfully, but these errors were encountered: