Skip to content

v1.1.0 - Baichuan models supported.

Compare
Choose a tag to compare
@Duyi-Wang Duyi-Wang released this 01 Dec 01:53
· 287 commits to main since this release
d5e576d

Models

  • Introduced Baichuan models support and added the convert tool for Baichuan models.

Performance Optimizations

  • Update xDNN to version 1.2.1 to improve performance of BF16 data type with AMX instruction on 4th generation Intel Xeon Scalable processors.
  • Improved performance of BF16 data type inference by adding matMul bf16bf16bf16 primitives and optimizing kernel selection strategy.
  • Improved performance of the model with unbalanced split allocation.

Functionality

  • Introduced prefix sharing feature.
  • Add sample strategy for token search, support temperature, top k, and top P parameter.
  • Introduce convert module to xfastertransformer python API.
  • Introduced grouped-query attention support for Llama2.
  • Auto-detect oneCCL environment and enter single-rank model if oneCCL does not exist.
  • Auto-detect oneCCL environment in compilation. If not detected, oneCCL will be built from source.
  • Add C++ exit function for multi-rank model.
  • Remove mklml 3rd party dependency.
  • Export normalization and position embedding C++ API, including alibi embedding and rotary embedding.
  • Introduced XFT_DEBUG_DIR environment value to specify the debug file directory.

BUG fix

  • Fix runtime issue of oneCCL shared memory model.
  • Fix path concat issue in convert tools.