Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First call to sess.run() at inference time is slow #9

Closed
thomasweng15 opened this issue Oct 18, 2021 · 5 comments
Closed

First call to sess.run() at inference time is slow #9

thomasweng15 opened this issue Oct 18, 2021 · 5 comments

Comments

@thomasweng15
Copy link

Hi, have you encountered an issue where the first call to sess.run() in contact_grasp_estimator.py is slow? I am running the inference example in the readme, and when I time sess.run() the first call takes much longer than subsequent calls:

Run inference 1162.3998165130615
Preprocess pc for inference 0.0007269382476806641
Run inference 0.2754530906677246
Preprocess pc for inference 0.0006759166717529297

I found this thread on what seems to be a similar issue but the simple resolutions have not worked, and I have not tried compiling tensorflow from source yet. I am running on a GTX 3090 with CUDA 11.1, tensorflow-gpu==2.2. Have you encountered this issue before? Thanks for your help.

@thomasweng15
Copy link
Author

The quality for the grasps is also much worse than expected:

image

I have tried recompiling the pointnet tf ops using this script https://github.com/NVlabs/contact_graspnet/blob/main/compile_pointnet_tfops.sh but the problem persists. I did the same setup on another, brand new machine, also with a GTX 3090 and with CUDA 11.2, and encountered the same problem and performance.

@arsalan-mousavian
Copy link
Collaborator

Regarding inference: on the desktops I have tried it, it may take 2-3 seconds on the first inference but not 1162 seconds... not sure why it takes longer on your machine.

Regarding problem in inference: Some thing is terribly wrong in here. I assume you already checked git status and there is nothing changed in the repo. I have tested this code with cuda 11.1 on multiple machines with no problem. Can you try cuda 11.1 and tensorflow-gpu 2.2.0? In other projects with custom cuda ops (in pytorch), I have seen discrepancy between cuda versions (I know it's surprising, but I have seen it).

@arsalan-mousavian
Copy link
Collaborator

@thomasweng15 let me know if setting up cuda 11.1 fixes the issue for you.

@thomasweng15
Copy link
Author

I switched to cuda 11.1 and ran it with tensorflow-gpu 2.2, but had the same issue. I then upgraded to tensorflow-gpu 2.5, reasoning that the 3080 and 3090 GPUs were too new for previous tensorflow-gpu versions, and also knowing that my labmate also used 2.5. I had to recompile the pointnet tf_ops, and install cudnn 8.1 and cudatoolkit 11.0 from conda-forge. The problem is now fixed: the first inference runs in 2 seconds, and the predictions look much better:
image

So the takeaway is that newer 30xx GPUs should upgrade to to tensorflow-gpu=2.5.

@xlim1996
Copy link

@thomasweng15
hi, Do you have a yml file about the new version environment tf2.5 , cuda 11.0 and cudnn8.1 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants