Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

Closed
david-PHR opened this issue Apr 15, 2024 · 5 comments
Closed

TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

david-PHR opened this issue Apr 15, 2024 · 5 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@david-PHR
Copy link

Description

I am cross-compiling some models on an H100 PCIe GPU for execution on H100 SXM5 GPUs. However, I've observed a slowdown of approximately 3% at runtime compared to non-cross-compilation mode. I am using TensorRT 8.6. This is somewhat surprising, as I expected the hardware to be the same except for the I/O interface.

I would like to know if this performance decrease is expected and whether there is potential for better results with TensorRT 10. Additionally, do you have any tips for improving performance in cross-compilation mode?

Environment

TensorRT Version:
TensorRT 8.6
NVIDIA GPU:
H100 PCIe and H100 SXM5

@zerollzeng
Copy link
Collaborator

@nvpohanh @oxana-nvidia is this expected?

@zerollzeng zerollzeng self-assigned this Apr 18, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 18, 2024
@oxana-nvidia
Copy link
Collaborator

General guidance is up to 10% perf difference is expected for cross-compilation mode due to overhead to support that cross-compilation. So 3% is in range of expected value.

For TensorRT 10, I suggest you try it and see how it works. It is hard to do any predictions without details. We are working on perf improvements in general, but if it helps in a particular case that I don't know. Depends on your use case.

@nvpohanh
Copy link
Collaborator

I would say that's expected... H100-PCIe and H100-SXM have very different number of CUDA cores so the best tactic for H100-PCIe may not work well for H100-SXM

@lix19937
Copy link

I am cross-compiling some models on an H100 PCIe GPU for execution on H100 SXM5 GPUs.

@david-PHR It means models(like .plan file) was generated on an H100 PCIe GPU, an then use plan(engine) execution on H100 SXM5 GPUs ?

@ttyio
Copy link
Collaborator

ttyio commented Jul 2, 2024

closing since no activity for more than 3 weeks, pls reopen if you still have question. thanks all!

@ttyio ttyio closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants