v1.1.0 - Baichuan models supported.

Duyi-Wang released this 01 Dec 01:53

· 287 commits to main since this release

d5e576d

Models

Introduced Baichuan models support and added the convert tool for Baichuan models.

Performance Optimizations

Update xDNN to version 1.2.1 to improve performance of BF16 data type with AMX instruction on 4th generation Intel Xeon Scalable processors.
Improved performance of BF16 data type inference by adding matMul bf16bf16bf16 primitives and optimizing kernel selection strategy.
Improved performance of the model with unbalanced split allocation.

Functionality

Introduced prefix sharing feature.
Add sample strategy for token search, support temperature, top k, and top P parameter.
Introduce convert module to xfastertransformer python API.
Introduced grouped-query attention support for Llama2.
Auto-detect oneCCL environment and enter single-rank model if oneCCL does not exist.
Auto-detect oneCCL environment in compilation. If not detected, oneCCL will be built from source.
Add C++ exit function for multi-rank model.
Remove mklml 3rd party dependency.
Export normalization and position embedding C++ API, including alibi embedding and rotary embedding.
Introduced XFT_DEBUG_DIR environment value to specify the debug file directory.

BUG fix

Fix runtime issue of oneCCL shared memory model.
Fix path concat issue in convert tools.

Assets 2