[Badcase]: 性能评测报告A100卡只有1并发报告；8并发，16并发请求性能数据能达到多少？ #1155

AuSong · 2025-01-07T06:45:21Z

Model Series

Qwen2.5

What are the models used?

Qwen2.5-7B-Instruct

What is the scenario where the problem happened?

Qwen2.5-7B-Instruct 性能

Is this badcase known and can it be solved using avaiable techniques?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find a solution there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

NVIDIA A100 80GB

CUDA 12.1

vLLM 0.6.3

Pytorch 2.4.0

Flash Attention 2.6.3

Transformers 4.46.0

Description

Steps to reproduce

This happens to Qwen2.5-xB-Instruct-xxx and xxx.
The badcase can be reproduced with the following steps:

...
...

The following example input & output can be used:

system: ...
user: ...
...

Expected results

The results are expected to be ...

Attempts to fix

I have tried several ways to fix this, including:

adjusting the sampling parameters, but ...
prompt engineering, but ...

Anything else helpful for investigation

I find that this problem also happens to ...

The text was updated successfully, but these errors were encountered:

jklj077 · 2025-01-13T09:07:22Z

I believe the report provided performance results for a single request, which is not conceptually the same with concurrency one.

With concurrent requests, the maximum throughput (tokens per second) should be stable for vllm whatever the concurrency. However, the latency (e.g., time to first token) for each request varies significantly depending on the vllm configuration and the data distribution.

We recommend benchmarking the results on your own with the vllm official script at https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py which can simulate max concurrency, request rate, burstiness, etc., with a sample dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Badcase]: 性能评测报告A100卡只有1并发报告；8并发，16并发请求性能数据能达到多少？ #1155

[Badcase]: 性能评测报告A100卡只有1并发报告；8并发，16并发请求性能数据能达到多少？ #1155

AuSong commented Jan 7, 2025

jklj077 commented Jan 13, 2025

[Badcase]: 性能评测报告A100卡只有1并发报告；8并发，16并发请求性能数据能达到多少？ #1155

[Badcase]: 性能评测报告A100卡只有1并发报告；8并发，16并发请求性能数据能达到多少？ #1155

Comments

AuSong commented Jan 7, 2025

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

jklj077 commented Jan 13, 2025