TensorRT OSS v8.4.1 GA
TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
- Updates since TensorRT 8.2.1 GA release.
- Please refer to the TensorRT 8.4.1 GA release notes for more information.
Key Features and Updates:
-
Samples enhancements
- Added Detectron2 Mask R-CNN R50-FPN python sample
- Added a quickstart guide for NVidia Triton deployment workflow.
- Added onnx export script for sampleOnnxMnistCoordConvAC
- Removed
sampleNMT
. - Removed usage of deprecated TensorRT APIs in samples.
-
EfficientDet sample
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
-
HuggingFace transformer demo
- Added BART model.
- Performance speedup of GPT-2 greedy search using GPU implementation.
- Fixed GPT2 onnx export failure due to 2G file size limitation.
- Extended Megatron LayerNorm plugins to support larger hidden sizes.
- Added performance benchmarking mode.
- Enable tf32 format by default.
-
demoBERT
enhancements- Add
--duration
flag to perf benchmarking script. - Fixed import of
nvinfer_plugins
library in demoBERT on Windows.
- Add
-
Torch-QAT toolkit
quant_bert.py
module removed. It is now upstreamed to HuggingFace QDQBERT.- Use axis0 as default for deconv.
- #1939 - Fixed path in
classification_flow
example.
-
Plugin enhancements
- Added Disentangled attention plugin,
DisentangledAttention_TRT
, to support DeBERTa model. - Added Multiscale deformable attention plugin,
MultiscaleDeformableAttnPlugin_TRT
, to support DDETR model. - Added new plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin.
- Refactored EfficientNMS plugin to support TF-TRT and implicit batch mode.
fp16
support forpillarScatterPlugin
.
- Added Disentangled attention plugin,
-
Build containers
- Updated default cuda versions to
11.6.2
. - CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
- Install
devtoolset-8
for updated g++ versions in CentOS7 container.
- Updated default cuda versions to
-
Tooling enhancements
- Added Tensorflow Quantization Toolkit v0.1.0 for Quantization-Aware-Training of Tensorflow 2.x Keras models.
- Added TensorRT Engine Explorer v0.1.2 for inspecting TensorRT engine plans and associated inference profiling data.
- Updated Polygraphy to v0.38.0.
- Updated onnx-graphsurgeon to v0.3.19.
-
trtexec
enhancements- Added
--layerPrecisions
and--layerOutputTypes
flags for specifying layer-wise precision and output type constraints. - Added
--memPoolSize
flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the--workspace
flag has been deprecated. - "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
- Use
enqueueV2()
instead ofenqueue()
when engine has explicit batch dimensions.
- Added