Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT OSS 21.04 release #1185

Merged
merged 14 commits into from
Apr 12, 2021
Merged

Conversation

rajeevsrao
Copy link
Collaborator

@rajeevsrao rajeevsrao commented Apr 12, 2021

Added

  • SM86 kernels for BERT MHA plugin
  • Added opset13 support for SoftMax, LogSoftmax, Squeeze, and Unsqueeze.
  • Added support for the EyeLike and GatherElements operators.

Changed

  • Updated TensorRT version to v7.2.3.4.
  • Update to ONNX-TensorRT 21.03
  • ONNX-GraphSurgeon (v0.3.4) - updates fold_constants to correctly exit early.
  • Set default CUDA_INSTALL_DIR #798
  • Plugin bugfixes, qkv kernels for sm86
  • Fixed GroupNorm CMakeFile for cu sources #1083
  • Permit groupadd with non-unique GID in build containers #1091
  • Avoid reinterpret_cast #146
  • Clang-format plugins and samples
  • Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp #1028
  • Update BERT plugin documentation.

Removed

  • Removes extra terminate call in InstanceNorm

rajeevsrao and others added 12 commits April 12, 2021 12:14
PriorBox plugin serialize CPU metadata (array size) A and GPU data
(array elements) B' in engine. B' is modified from CPU array B
when constructing the object. A deserialized object then holds data
A and B' which is different from the original (A and B).

If a new object is created from a deserialized one via `PriorBox::clone()`,
which rebuilds array elements at GPU side from CPU holding array A and
B', the generated GPU data is incorrect (A and B''), resulting in
wrong inference result.

As PriorBox is designed to track data in specific format, we now
serialize only the CPU data A and B, i.e. the parameters that used to
construct a PriorBox object, to engine.

bad image processing with deserialized engine

1. Fixed the memory deallocation error in plugin PriorBox::clone() method
even without serialization by initializing empty pointer to nullptr.

2. Initialized weights to empty structs

3. Added mParam.aspectRatios to serialization and deserialization since
mParam.aspectRatios are different from aspectRatios device weights in count and values.

Signed-off-by: Rajeev Rao <[email protected]>
Signed-off-by: Rajeev Rao <[email protected]>
Signed-off-by: Rajeev Rao <[email protected]>
1. add varlen mha fp16 slen=384 kernel for sm_86
2. referesh all sm_86 kernels now use NVCC -gencode=arch=compute_86,code=\"sm_86\"
3. use unfused kernel for fixed len s=384 fp16

Signed-off-by: Rajeev Rao <[email protected]>
@rajeevsrao rajeevsrao merged commit 4c99d07 into NVIDIA:master Apr 12, 2021
@rajeevsrao rajeevsrao deleted the dev/21.04-release branch August 23, 2023 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants