Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't improve the performance of models on GTX 1080ti #14

Open
PythonImageDeveloper opened this issue May 12, 2019 · 6 comments
Open

don't improve the performance of models on GTX 1080ti #14

PythonImageDeveloper opened this issue May 12, 2019 · 6 comments

Comments

@PythonImageDeveloper
Copy link

Hi,
I optimized my trained model (1 class), the ssdlite_mobilenetv2 and ssd_resnet50, with TensorRT, but the performance did't improve significantly, I reach from 0.12 sec to 0.11 sec on GTX 1080 ti, why?
I installed Tensorflow 1.12.0 , cuda 9 , TensorRT 4.0.1.6 packages on Ubuntu 16.04.

@ardianumam
Copy link
Owner

Hi,
I optimized my trained model (1 class), the ssdlite_mobilenetv2 and ssd_resnet50, with TensorRT, but the performance did't improve significantly, I reach from 0.12 sec to 0.11 sec on GTX 1080 ti, why?
I installed Tensorflow 1.12.0 , cuda 9 , TensorRT 4.0.1.6 packages on Ubuntu 16.04.

I also tried to perform TRT optimization several days ago to SSD MobileNet1 with 1 class. I got 45 FPS in Jetson TX2 for both before & after TRT optimization. My temporary conclusion is: (i) TRT might be less optimized for network like MobileNet, maybe due its separable convolution that already performs very small computation so that there is less space for optimization. (ii) When I use more classes (e.g., 80 classes in COCO), there is more difference after TRT optimization (TRT seems optimizing conv. operation for the output prediction which is proportional to the number of classes).

@PythonImageDeveloper
Copy link
Author

PythonImageDeveloper commented May 15, 2019

Hi @ardianumam
Thanks for your reply, I have some question, if possible answer these questions. thanks
I installed these versions of the package on jetson tx2: TensorRT 5.0.2 , Tensorflow 1.13.1 , Cuda 10
Q1-I tested the ssdlit_mobilenetv2 with 1 class, this model achieves 22 FPS using Jetson TX2, is this result makes sense? or is there a room of improvement?
Q2- This model took about 10 min for loading the frozen graph, why?
Q3- maximum free of GPU memory is 5G out of 8G, why?

Please note which you're installed of packages versions.

Q4-How do you reach to 45 FPS, you are run the same codes of your GitHub to create the TFT model and execute of the model?

@ardianumam
Copy link
Owner

Hi,

Q1: What is your input dimension? And do you already set TX2 to the max performance? For 1 class, if you use 300x300 input dim (I use this dim), I think 22 FPS is too slow in TX2. Where do you get the ssdlit_mobilenetv2 model?

Q2: Only loading a pre-stored TRT_pb model is fast. Make sure you only loading the model, not building/optimizing a model.

Q3: TX2 uses 8GB shared memory, not only for GPU but also for the system memory, i.e., RAM.

Q4: I use this code and this model.

Btw, where do you get Tensorflow 1.13.1 for TX2? I try to google but can't find it yet.

@PythonImageDeveloper
Copy link
Author

PythonImageDeveloper commented May 16, 2019

Hi,
Thank you for good answers.
Q1- My input dims are both 300 and 600. but I achieve to 22 FPS with 300 input size.
You can get the ssdlite_mobilenetv2 model from model_zoo of tensorflow.
Q2-I only loading the TRT_pb model.
Please note which you're installed of packages versions. Thanks.

Are sure you get 45 FPS with ssd_mobilenetv1? In this page of nvidia, They achieve about 20 FPS (50 ms) with ssd_mobilenetv1 model. what's difference codes or package version this page with you?

@ardianumam
Copy link
Owner

Q2: All the packages & libraries versions I use are already provided in the README.md. Pls check.

This nvidia github model is for COCO (80 classes). I also got 20 FPS for that model. My 45 FPS model is for 1 class. This repo also reports even 50FPS for 5 classes using SSD MobileNet, and this articles reports > 30 FPS for SSD-GoogleNet in 20 classes (VOC dataset).

@PythonImageDeveloper
Copy link
Author

PythonImageDeveloper commented May 16, 2019

Hi,
Thanks, I installed jetpack 4.2, Is it not because of this version 4.2? It may not be compatible with other packages. I think because of this version my performance is very slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants