Skip to content

TensorRT OSS v8.4.1 GA

Compare
Choose a tag to compare
@rajeevsrao rajeevsrao released this 14 Jun 21:25
· 164 commits to main since this release

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.

Key Features and Updates:

  • Samples enhancements

  • EfficientDet sample

    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.
  • HuggingFace transformer demo

    • Added BART model.
    • Performance speedup of GPT-2 greedy search using GPU implementation.
    • Fixed GPT2 onnx export failure due to 2G file size limitation.
    • Extended Megatron LayerNorm plugins to support larger hidden sizes.
    • Added performance benchmarking mode.
    • Enable tf32 format by default.
  • demoBERT enhancements

    • Add --duration flag to perf benchmarking script.
    • Fixed import of nvinfer_plugins library in demoBERT on Windows.
  • Torch-QAT toolkit

    • quant_bert.py module removed. It is now upstreamed to HuggingFace QDQBERT.
    • Use axis0 as default for deconv.
    • #1939 - Fixed path in classification_flow example.
  • Plugin enhancements

  • Build containers

    • Updated default cuda versions to 11.6.2.
    • CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
    • Install devtoolset-8 for updated g++ versions in CentOS7 container.
  • Tooling enhancements

  • trtexec enhancements

    • Added --layerPrecisions and --layerOutputTypes flags for specifying layer-wise precision and output type constraints.
    • Added --memPoolSize flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace flag has been deprecated.
    • "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
    • Use enqueueV2() instead of enqueue() when engine has explicit batch dimensions.