Releases
v1.1.0
v1.1.0 - Baichuan models supported.
Models
Introduced Baichuan models support and added the convert tool for Baichuan models.
Performance Optimizations
Update xDNN to version 1.2.1 to improve performance of BF16 data type with AMX instruction on 4th generation Intel Xeon Scalable processors.
Improved performance of BF16 data type inference by adding matMul bf16bf16bf16 primitives and optimizing kernel selection strategy.
Improved performance of the model with unbalanced split allocation.
Functionality
Introduced prefix sharing feature.
Add sample strategy for token search, support temperature, top k, and top P parameter.
Introduce convert module to xfastertransformer python API.
Introduced grouped-query attention support for Llama2.
Auto-detect oneCCL environment and enter single-rank model if oneCCL does not exist.
Auto-detect oneCCL environment in compilation. If not detected, oneCCL will be built from source.
Add C++ exit function for multi-rank model.
Remove mklml 3rd party dependency.
Export normalization and position embedding C++ API, including alibi embedding and rotary embedding.
Introduced XFT_DEBUG_DIR
environment value to specify the debug file directory.
BUG fix
Fix runtime issue of oneCCL shared memory model.
Fix path concat issue in convert tools.
You can’t perform that action at this time.