-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm推理提速不明显,如何解决? #7
Comments
你好,想问一下千问模型部署好了再用vllm加速吗 |
一般都是直接用vllm框架部署qwen的 |
我给你一个demo这个是有明显加速的
from modelscope import AutoModelForCausalLM, AutoTokenizer
from modelscope import GenerationConfig
import time
# 可选的模型包括: "qwen/Qwen-7B-Chat", "qwen/Qwen-14B-Chat"
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
model.generation_config = GenerationConfig.from_pretrained("/root/autodl-tmp/qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
time1=time.time()
# 调用chat方法时传递tokenizer
response, history = model.chat(query="高血压患者能吃党参吗?", history=None, tokenizer=tokenizer)
time2=time.time()
print(f'{time2-time1}s')
print(response)
元气满满Q
***@***.***
…------------------ 原始邮件 ------------------
发件人: "覃悦(Yue ***@***.***>;
发送时间: 2024年5月30日(星期四) 中午11:13
收件人: ***@***.***>;
抄送: ***@***.***>; ***@***.***>;
主题: Re: [owenliang/qwen-vllm] vllm推理提速不明显,如何解决? (Issue #7)
history=None for i in range(len(raw_prompts)): # len(raw_prompts) = 100 q = raw_prompts[i] response, history = vllm_model.chat(query=q, history=history) print(response) history = history[:10]
之前没有用vllm,100个prompts大约是52s,使用vllm之后仍然是52s左右,似乎没有提速? 请问有人能帮忙看一下吗?
history=None for i in range(len(raw_prompts)): # len(raw_prompts) = 100 q = raw_prompts[i] response, history = vllm_model.chat(query=q, history=history) print(response) history = history[:10]
之前没有用vllm,100个prompts大约是52s,使用vllm之后仍然是52s左右,似乎没有提速? 请问有人能帮忙看一下吗?
你好,想问一下千问模型部署好了再用vllm加速吗
一般都是直接用vllm框架部署qwen的
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
之前没有用vllm,100个prompts大约是52s,使用vllm之后仍然是52s左右,似乎没有提速?
请问有人能帮忙看一下吗?
The text was updated successfully, but these errors were encountered: