-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Qwen Model by Alibaba #1110
Comments
|
I am thinking here that it should be compatible by the new C++ backend. According to their descriptions in the README.md below. https://github.com/QwenLM/qwen.cpp#qwencpp |
It still does not work.
|
Now, QWen has already been merged into llama.cpp @mudler : |
Tested with model |
The model config file for reference. # Model name.
# The model name is used to identify the model in the API calls.
name: gpt-3.5-turbo
# Default model parameters.
# These options can also be specified in the API calls
parameters:
model: qwen1_5-1_8b-chat-q5_k_m.gguf
temperature: 0.75
top_k: 85
top_p: 0.7
# Default context size
context_size: 8192
# Default number of threads
threads: 16
backend: llama-cpp
# define chat roles
roles:
user: "user:"
assistant: "assistant:"
system: "system:"
template:
chat_message: &template |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>
chat: &template |
{{.Input}}
<|im_start|>assistant
# Modify the prompt template here ^^^ as per your requirements
completion: &template |
{{.Input}}
# Enable F16 if backend supports it
stopwords:
- "<|im_end|>"
f16: false
embeddings: false
# Mirostat configuration (llama.cpp only)
mirostat_eta: 0.8
mirostat_tau: 0.9
mirostat: 1
# GPU Layers (only used when built with cublas)
gpu_layers: 25
# Enable memory lock
mmlock: false
# Define a prompt cache path (relative to the models)
prompt_cache_path: "prompt-cache"
# Cache all the prompts
prompt_cache_all: true
# Read only
prompt_cache_ro: false
# Enable mmap
mmap: true
# Enable low vram mode (GPU only)
low_vram: false
# Disable mulmatq (CUDA)
no_mulmatq: true
# Diffusers/transformers
cuda: true |
@thiner Thank you for your sharing. I created your yaml under LocalAI/models folder, and run
Hours passed, no progress. Why? |
I suggest to minimize your config file as start point, set the context length to a small value if you don't have powerful GPU. |
@thiner Thank you!
My environment:
The content of the qwen1_5-1_8b-chat-q5_k_m.yaml file is copied from your post. The /v1/models endpoint returns: |
You didn't load the config file correctly, because there is not |
I just found that there was a typo in my previous config file. I just fixed. # Model name.
# The model name is used to identify the model in the API calls.
name: gpt-3.5-turbo
# Default model parameters.
# These options can also be specified in the API calls
parameters:
model: qwen1_5-1_8b-chat-q5_k_m.gguf
temperature: 0.75
top_k: 85
top_p: 0.7
# Default context size
context_size: 512
# Default number of threads
threads: 16
backend: llama-cpp
# define chat roles
roles:
user: "user:"
assistant: "assistant:"
system: "system:"
template:
chat_message: &template |
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>
chat: &template |
{{.Input}}
<|im_start|>assistant
# Modify the prompt template here ^^^ as per your requirements
completion: &template |
{{.Input}}
# Enable F16 if backend supports it
stopwords:
- "<|im_end|>"
f16: false
embeddings: false
# Mirostat configuration (llama.cpp only)
mirostat_eta: 0.8
mirostat_tau: 0.9
mirostat: 1
# GPU Layers (only used when built with cublas)
gpu_layers: -1
# Define a prompt cache path (relative to the models)
prompt_cache_path: "prompt-cache"
# Cache all the prompts
prompt_cache_all: true
# Read only
prompt_cache_ro: false
# Diffusers/transformers
cuda: true |
@thiner Thank you! I replaced the yaml file with the new one, and docker-compose restart. However, the problem still exists the same. My disk space is not full. |
If there is not error poping up, just no response, check this out: https://localai.io/faq/#everything-is-slow-how-is-it-possible. |
This happen to me too… |
Is your feature request related to a problem? Please describe.
No. I just want to ask whether LocalAI can support Qwen model. https://github.com/QwenLM/Qwen
Describe the solution you'd like
Support Qwen model as a backend. They have already supported OpenAI style API, https://github.com/QwenLM/Qwen#api, maybe that would be easy to integrate.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: