Releases · modelscope/dash-infer · GitHub

27 Aug 03:33

yejunjin

v1.3.0 Latest

Latest

Highlight

Support Baichuan-7B and Baichuan2-7B & 13B by @WangNorthSea in #38

Full Changelog: v1.2.1...v1.3.0

Contributors

WangNorthSea

Assets 12

01 Jul 03:28

yejunjin

v1.2.1

What's Changed

Add llama.cpp benchmark steps
fix: fallback to mha without avx512f support
solve security issue; helper: bugfix, cpu platform check
add release package workflow

Assets 13

24 Jun 05:32

yejunjin

v1.2.0

expand context length to 32K & support flash attention on intel-avx512 platform

remove currently unsupported cache mode
examples: update qwen prompt template, add print func to examples
support glm-4-9b-chat by
change to size_t to avoid overflow when seq is long
update README since we support 32k context length
Add flash attention on intel-avx512 platform

Assets 13

29 May 08:32

laiwenzh

v1.1.0

support Qwen2, change dashinfer model extensions

support Qwen2, add model_type Qwen_v20
change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors)
python example: remove xxx_quantize.json config file, use command line arg instead

Assets 13

14 May 05:50

laiwenzh

v1.0.4

First official release.

Assets 13