-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorrt test failed #1454
Comments
hi @twmht, the log file you provided did not include cuda error information.
|
oops. I may upload the wrong file. I have updated the post and please download the file again. |
Got it. Please @grimoire have a look. |
I am not 100% sure, but I guess it is caused by memory conflict. |
i don't understand the memory pool, but why we have memory conflict if they have their "own" memory pool? |
Update: |
what cause the conflict? in my test log, there are many test failed, not only test_batched_nms. |
Just comment test_batched_nms and all other test will be passed. |
Interesting. Why is pytorch conflict with thrust::sort_by_key? |
Both PyTorch and thrust use cub in their source code. And there is a After compile and load both libraries, they have two different reference: |
thank you! Great explain! But when running mmcv nms ops (https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/nms.py#L26) alone without running tensorrt module, the test would be fine. Why tensorrt introduce the influence? if the cub version of mmcv is different from the cub version of pytorch, it should cause the cuda error with mmcv+pytorch1.8 even without running tensorrt module the related issue is also here pytorch/pytorch#54245, maybe I can also try pytorch1.8.1 to see if the issue can be solved. |
TensorRT nms and MMCV nms are different. In MMCV implementation, we use PyTorch to do the sort or topk, which will not bring another The error is caused by the way how compiler process static variable. Here is an example (from a blog) I am using torch1.8.1 and the error still exists. Not sure if they have a fix in 1.10.0. |
when trying to run test on test_tensorrt.py, there are many cuda errors.
My environment is cuda11+pytorch1.8.0+tensorrt7.1.3.4
I found out when forwarding pytorch module with cpu mode, the error could be gone.
or remove the tensorrt part, the error could be gone.
so the issue is that when forwarding pytorch module with gpu mode and run tensorrt module together, the cuda error may be throw.
Here are the test log
log.txt
The text was updated successfully, but these errors were encountered: