v1.2.0 - Qwen models and much more data types supported.

Duyi-Wang released this 21 Dec 08:12

· 248 commits to main since this release

2d1024f

Models

Introduced Qwen models support and added the convert tool for Qwen models.
ChatGLM3 model is verfied and API supported.

Performance Optimizations

Update xDNN to version 1.4.2 to improve performance and support more data types.
Accelerate first token's generation with BF16-gemm Multi-Head Attention.

Functionality

Introduce more data types supports, including W8A8, INT4, and NF4. The hybrid data types between these new data types are supported.
Add accuracy evaluation script to assess the impact of different precisions on the text generation performance of the model.
Introduce XFT_VERBOSE macro to help profile model performance of each gemm. Set 1 to enable information ouput and default is 0.
Decouple oneCCL and MPI dependencies into a communication helper library. oneCCL environment is no longer needed when running in single-rank mode.

Assets 2