You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am cross-compiling some models on an H100 PCIe GPU for execution on H100 SXM5 GPUs. However, I've observed a slowdown of approximately 3% at runtime compared to non-cross-compilation mode. I am using TensorRT 8.6. This is somewhat surprising, as I expected the hardware to be the same except for the I/O interface.
I would like to know if this performance decrease is expected and whether there is potential for better results with TensorRT 10. Additionally, do you have any tips for improving performance in cross-compilation mode?
General guidance is up to 10% perf difference is expected for cross-compilation mode due to overhead to support that cross-compilation. So 3% is in range of expected value.
For TensorRT 10, I suggest you try it and see how it works. It is hard to do any predictions without details. We are working on perf improvements in general, but if it helps in a particular case that I don't know. Depends on your use case.
I would say that's expected... H100-PCIe and H100-SXM have very different number of CUDA cores so the best tactic for H100-PCIe may not work well for H100-SXM
Description
I am cross-compiling some models on an H100 PCIe GPU for execution on H100 SXM5 GPUs. However, I've observed a slowdown of approximately 3% at runtime compared to non-cross-compilation mode. I am using TensorRT 8.6. This is somewhat surprising, as I expected the hardware to be the same except for the I/O interface.
I would like to know if this performance decrease is expected and whether there is potential for better results with TensorRT 10. Additionally, do you have any tips for improving performance in cross-compilation mode?
Environment
TensorRT Version:
TensorRT 8.6
NVIDIA GPU:
H100 PCIe and H100 SXM5
The text was updated successfully, but these errors were encountered: