Releases: oneapi-src/oneDNN
v0.21.2
This is a patch release containing following changes to v0.21.1:
- Fixed performance regression in GEMM (9534621)
- Fixed int8 dilated convolution for some shapes with input heights <= dilation over the heights dimension (e68f151)
- Addressed static initialization order issue in bf16 converters (ae8efde)
- Fixed fast reference backward convolution dispatching for 3D-spatial case (5994d63)
v1.1
Performance optimizations
- Improved functionality performance with TBB threading achieving comparable performance with OpenMP threading.
- Improved int8 and fp32 GEMM performance on system with Intel AVX-512 and Intel VNNI support.
- Improved softmax performance for NHWC and corresponding blocked layouts.
- Improved RNN cell performance and decreased dependency of RNN performance from the compiler vectorization capabilities.
- Improved reorders performance for some shapes.
New functionality
- Introduced layer normalization and binary elementwise primitives support (CPU engine).
- Introduced swish (CPU and GPU engines) and gelu (GPU engine) activation support in elementwise primitive.
- Introduced bfloat16 data type support in RNN cells (CPU engine).
- Introduced initial int8 and bfloat16 data types support for GPU functionality.
Usability improvements
- TBB threading support is promoted to production quality.
- Introduced support for memory format
any
for memory-bound primitives backpropagation. This mechanism allows to match gradient memory format with source and destination memory formats from forward pass. - Changed default compiler flags to target Intel SSE4.1 instruction set to make builds portable.
- (experimental) Introduced caching mechanism that reduces primitive creation time for repeated primitive creation. The functionality is disabled by default and has to be enabled in compile time.
Validation improvements
- Extended benchdnn to cover all supported primitives.
- Introduced robust validation method for RNN cells in benchdnn. The approach allows to replace activations with linear function to make error accumulation more predictable and decrease the number of false positives.
- Extended convolution test coverage.
Thanks to the contributors
This release contains contributions from many Intel Performance Libraries developers as well as Ilia Taraban, Jacek Czaja @jczaja, William Tambellini @WilliamTambellini, Tomasz Kalina, Mateusz Guziak, Daniel Haidachuk, Konstantin Basargin @basargin, Aaron Johnson @aaronjohnson, and Jeremy Wong @jrmwng. We would also like to thank everyone who asked questions and reported issues.
v0.21.1
This is a patch release containing following changes to Intel MKL-DNN v0.21:
v0.21
Performance optimizations
- Improved int8 and fp32 GEMM and inner product performance.
- Improved reorder performance for certain shapes.
- Improved RNN, LSTM, GRU and LBR-GRU training performance.
New functionality
- Added GELU activation support.
Thanks to the contributors
This release contains contributions from many Intel Performance Libraries developers. We would also like to thank everyone who asked questions and reported issues.
v1.1-rc
This is a release candidate for DNNL v1.1. Please provide feedback and report bugs in Github issues.
v0.20.5
This is a patch release containing following changes to Intel MKL-DNN v0.20.4:
v0.20.4
v0.21-rc
This is a release candidate for Intel MKL-DNN v0.21. Please provide feedback and report bugs in Github issues.
v0.20.3
v1.0.2
This is a patch release containing following changes to Intel MKL-DNN v1.0.1:
- Fixed issue with bfloat16 instructions detection in Xbyak (0f4ba11)
- Fixed buffer size in packed GEMM (9764940)
- Fixed offset calculation issue in weight update depthwise convolution in fp32 and bfloat16 kernels (6b9d412, 061499d)
- Added check that size of generated kernel doesn't exceed the maximum allowed bound in fp32 forward and backward kernels (67e8cd2)
- Various fixes in RNN primitive:
- Proper handling of packed GEMM in extended GEMM (4eb9f56)
- Force no-copy GEMM only for Intel AVX+ systems (2fbc8ba)
- Avoid unaligned pointers usage in vex instructions in GRU cell (a147c08)
- Fixed wrong dimension when creating GEMM primitive descriptor in reference RNN implementation for GPU (eb3c866)
- Fixed Tanh backward calculation in GPU RNN reference implementation (f6e4b97)
- Fixed pack GEMM dispatching for int8 (16b46c7)
- Addressed bugs in tests for RNNs (cf83e83, f7c2de2, 960f3f3)