Release TensorRT OSS v8.4.1 GA · NVIDIA/TensorRT

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.

Updates since TensorRT 8.2.1 GA release.
Please refer to the TensorRT 8.4.1 GA release notes for more information.

Key Features and Updates:

Samples enhancements
- Added Detectron2 Mask R-CNN R50-FPN python sample
- Added a quickstart guide for NVidia Triton deployment workflow.
- Added onnx export script for sampleOnnxMnistCoordConvAC
- Removed sampleNMT.
- Removed usage of deprecated TensorRT APIs in samples.
EfficientDet sample
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
HuggingFace transformer demo
- Added BART model.
- Performance speedup of GPT-2 greedy search using GPU implementation.
- Fixed GPT2 onnx export failure due to 2G file size limitation.
- Extended Megatron LayerNorm plugins to support larger hidden sizes.
- Added performance benchmarking mode.
- Enable tf32 format by default.
demoBERT enhancements
- Add --duration flag to perf benchmarking script.
- Fixed import of nvinfer_plugins library in demoBERT on Windows.
Torch-QAT toolkit
- quant_bert.py module removed. It is now upstreamed to HuggingFace QDQBERT.
- Use axis0 as default for deconv.
- #1939 - Fixed path in classification_flow example.
Plugin enhancements
- Added Disentangled attention plugin, DisentangledAttention_TRT, to support DeBERTa model.
- Added Multiscale deformable attention plugin, MultiscaleDeformableAttnPlugin_TRT, to support DDETR model.
- Added new plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin.
- Refactored EfficientNMS plugin to support TF-TRT and implicit batch mode.
- fp16 support for pillarScatterPlugin.
Build containers
- Updated default cuda versions to 11.6.2.
- CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
- Install devtoolset-8 for updated g++ versions in CentOS7 container.
Tooling enhancements
- Added Tensorflow Quantization Toolkit v0.1.0 for Quantization-Aware-Training of Tensorflow 2.x Keras models.
- Added TensorRT Engine Explorer v0.1.2 for inspecting TensorRT engine plans and associated inference profiling data.
- Updated Polygraphy to v0.38.0.
- Updated onnx-graphsurgeon to v0.3.19.
trtexec enhancements
- Added --layerPrecisions and --layerOutputTypes flags for specifying layer-wise precision and output type constraints.
- Added --memPoolSize flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace flag has been deprecated.
- "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
- Use enqueueV2() instead of enqueue() when engine has explicit batch dimensions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT OSS v8.4.1 GA