Skip to content

Releases: modelscope/dash-infer

v1.3.0

27 Aug 03:33
Compare
Choose a tag to compare

Highlight

Full Changelog: v1.2.1...v1.3.0

v1.2.1

01 Jul 03:28
5ceddf9
Compare
Choose a tag to compare

What's Changed

  • Add llama.cpp benchmark steps
  • fix: fallback to mha without avx512f support
  • solve security issue; helper: bugfix, cpu platform check
  • add release package workflow

v1.2.0

24 Jun 05:32
3a0417b
Compare
Choose a tag to compare

expand context length to 32K & support flash attention on intel-avx512 platform

  • remove currently unsupported cache mode
  • examples: update qwen prompt template, add print func to examples
  • support glm-4-9b-chat by
  • change to size_t to avoid overflow when seq is long
  • update README since we support 32k context length
  • Add flash attention on intel-avx512 platform

v1.1.0

29 May 08:32
Compare
Choose a tag to compare

support Qwen2, change dashinfer model extensions

  • support Qwen2, add model_type Qwen_v20
  • change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors)
  • python example: remove xxx_quantize.json config file, use command line arg instead

v1.0.4

14 May 05:50
Compare
Choose a tag to compare

First official release.