TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

david-PHR · 2024-04-15T13:42:22Z

Description

I am cross-compiling some models on an H100 PCIe GPU for execution on H100 SXM5 GPUs. However, I've observed a slowdown of approximately 3% at runtime compared to non-cross-compilation mode. I am using TensorRT 8.6. This is somewhat surprising, as I expected the hardware to be the same except for the I/O interface.

I would like to know if this performance decrease is expected and whether there is potential for better results with TensorRT 10. Additionally, do you have any tips for improving performance in cross-compilation mode?

Environment

TensorRT Version:
TensorRT 8.6
NVIDIA GPU:
H100 PCIe and H100 SXM5

zerollzeng · 2024-04-18T08:37:24Z

@nvpohanh @oxana-nvidia is this expected?

oxana-nvidia · 2024-04-18T15:22:15Z

General guidance is up to 10% perf difference is expected for cross-compilation mode due to overhead to support that cross-compilation. So 3% is in range of expected value.

For TensorRT 10, I suggest you try it and see how it works. It is hard to do any predictions without details. We are working on perf improvements in general, but if it helps in a particular case that I don't know. Depends on your use case.

nvpohanh · 2024-04-19T02:36:34Z

I would say that's expected... H100-PCIe and H100-SXM have very different number of CUDA cores so the best tactic for H100-PCIe may not work well for H100-SXM

lix19937 · 2024-04-21T14:17:36Z

I am cross-compiling some models on an H100 PCIe GPU for execution on H100 SXM5 GPUs.

@david-PHR It means models(like .plan file) was generated on an H100 PCIe GPU, an then use plan(engine) execution on H100 SXM5 GPUs ?

ttyio · 2024-07-02T17:06:59Z

closing since no activity for more than 3 weeks, pls reopen if you still have question. thanks all!

zerollzeng self-assigned this Apr 18, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 18, 2024

ttyio closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

david-PHR commented Apr 15, 2024

zerollzeng commented Apr 18, 2024

oxana-nvidia commented Apr 18, 2024

nvpohanh commented Apr 19, 2024

lix19937 commented Apr 21, 2024

ttyio commented Jul 2, 2024

TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

TensorRT Cross-Compilation h100 PCIe and h100 SXM5 slow-down #3801

Comments

david-PHR commented Apr 15, 2024

Description

Environment

zerollzeng commented Apr 18, 2024

oxana-nvidia commented Apr 18, 2024

nvpohanh commented Apr 19, 2024

lix19937 commented Apr 21, 2024

ttyio commented Jul 2, 2024