0.1.17
Changes
- feat: enable flash attention by default @vansangpfiev (#82)
- feat: support use_mmap option in parameter @vansangpfiev (#79)
- fix: remove avx2 check since we have it at cortex-cpp layer @vansangpfiev (#78)
- chore: changes windows CI runners @vansangpfiev (#81)
- feat: enable caching by default @vansangpfiev (#77)
- Update llama.cpp submodule to latest release b3091 @jan-service-account (#76)
- feat: add cache_type parameter @vansangpfiev (#75)
- Update llama.cpp submodule to latest release b3088 @jan-service-account (#74)
- fix: use inference stop words by default, fallback to loading model stop words @vansangpfiev (#72)
- feat: add stop words when loading model @vansangpfiev (#71)
- Update llama.cpp submodule to latest release b3078 @jan-service-account (#70)
- Update llama.cpp submodule to latest release b3070 @jan-service-account (#69)
- Update llama.cpp submodule to latest release b3051 @jan-service-account (#68)
- Update llama.cpp submodule to latest release b3040 @jan-service-account (#67)