Skip to content

Commit

Permalink
feat: enable flash attention by default (#82)
Browse files Browse the repository at this point in the history
Co-authored-by: vansangpfiev <[email protected]>
  • Loading branch information
vansangpfiev and sangjanai authored Jun 9, 2024
1 parent 6192c85 commit 8bf2fd8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 6 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,6 @@ Table of parameters
|`model_type` | String | Model type we want to use: llm or embedding, default value is llm|
|`model_alias`| String | Used as model_id if specified in request, mandatory in loadmodel|
|`model` | String | Used as model_id if specified in request, mandatory in chat/embedding request|
|`flash_attn` | Boolean| To enable Flash Attention, default is false|
|`flash_attn` | Boolean| To enable Flash Attention, default is true|
|`cache_type` | String| KV cache type: f16, q8_0, q4_0, default is f16|
|`use_mmap` | Boolean| To enable mmap, default is true|
|`use_mmap` | Boolean| To enable mmap, default is true|
6 changes: 2 additions & 4 deletions src/llama_engine.cc
Original file line number Diff line number Diff line change
Expand Up @@ -346,14 +346,12 @@ bool LlamaEngine::LoadModelImpl(std::shared_ptr<Json::Value> json_body) {
params.cache_type_v = params.cache_type_k;
LOG_DEBUG << "cache_type: " << params.cache_type_k;

// Check for backward compatible
auto fa0 = json_body->get("flash-attn", false).asBool();
auto fa1 = json_body->get("flash_attn", false).asBool();
auto fa = json_body->get("flash_attn", true).asBool();
auto force_enable_fa = params.cache_type_k != kTypeF16;
if (force_enable_fa) {
LOG_DEBUG << "Using KV cache quantization, force enable Flash Attention";
}
params.flash_attn = fa0 || fa1 || force_enable_fa;
params.flash_attn = fa || force_enable_fa;
if (params.flash_attn) {
LOG_DEBUG << "Enabled Flash Attention";
}
Expand Down

0 comments on commit 8bf2fd8

Please sign in to comment.