You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Successful Request 359
Request_Gen_Token_Len 1024
Batch Size 64
Avg_Input_Token_Len 1737.53
Avg_Gen_Token_Len 1000.3
Elapse_Time (s) 226.188
Time_to_First_Token_AVG (s) 9.957
Time_to_First_Token_P99 (s) 30.965
Time_per_Output_Token_AVG (s) 0.029
Time_per_Output_Token_P99 (s) 0.03
Latency_P90 (s) 57.549
Latency_P95 (s) 58.187
Latency_P99 (s) 61.007
Latency_AVG (s) 34.043
Token QPS (token/s) 1587.65
Service QPS (req/s) 1.59
Successful Request 208
Request_Gen_Token_Len 1024
Batch Size 128
Avg_Input_Token_Len 1802.95
Avg_Gen_Token_Len 994.21
Elapse_Time (s) 135.085
Time_to_First_Token_AVG (s) 36.664
Time_to_First_Token_P99 (s) 62.527
Time_per_Output_Token_AVG (s) 0.028
Time_per_Output_Token_P99 (s) 0.045
Latency_P90 (s) 88.988
Latency_P95 (s) 90.888
Latency_P99 (s) 92.339
Latency_AVG (s) 33.051
Token QPS (token/s) 1530.85
Service QPS (req/s) 1.54
awq result:
Successful Request 369
Request_Gen_Token_Len 1024
Batch Size 64
Avg_Input_Token_Len 1726.56
Avg_Gen_Token_Len 952.3
Elapse_Time (s) 212.125
Time_to_First_Token_AVG (s) 8.244
Time_to_First_Token_P99 (s) 29.357
Time_per_Output_Token_AVG (s) 0.029
Time_per_Output_Token_P99 (s) 0.062
Latency_P90 (s) 53.352
Latency_P95 (s) 55.721
Latency_P99 (s) 58.419
Latency_AVG (s) 31.806
Token QPS (token/s) 1656.56
Service QPS (req/s) 1.74
Successful Request 177
Request_Gen_Token_Len 1024
Batch Size 128
Avg_Input_Token_Len 1804.7
Avg_Gen_Token_Len 931.08
Elapse_Time (s) 105.276
Time_to_First_Token_AVG (s) 30.793
Time_to_First_Token_P99 (s) 59.689
Time_per_Output_Token_AVG (s) 0.028
Time_per_Output_Token_P99 (s) 0.072
Latency_P90 (s) 72.126
Latency_P95 (s) 86.212
Latency_P99 (s) 88.854
Latency_AVG (s) 24.425
Token QPS (token/s) 1565.43
Service QPS (req/s) 1.68
The text was updated successfully, but these errors were encountered:
anaivebird
changed the title
qserve with tensorrt-llm is slower and awq int4 for llama2-7b
qserve group 128 with tensorrt-llm is slower and awq int4 for llama2-7b
Nov 28, 2024
anaivebird
changed the title
qserve group 128 with tensorrt-llm is slower and awq int4 for llama2-7b
qserve is slower then awq int4 for llama2-7b on H100
Nov 29, 2024
System Info
performance results
qserve result:
Successful Request 359
Request_Gen_Token_Len 1024
Batch Size 64
Avg_Input_Token_Len 1737.53
Avg_Gen_Token_Len 1000.3
Elapse_Time (s) 226.188
Time_to_First_Token_AVG (s) 9.957
Time_to_First_Token_P99 (s) 30.965
Time_per_Output_Token_AVG (s) 0.029
Time_per_Output_Token_P99 (s) 0.03
Latency_P90 (s) 57.549
Latency_P95 (s) 58.187
Latency_P99 (s) 61.007
Latency_AVG (s) 34.043
Token QPS (token/s) 1587.65
Service QPS (req/s) 1.59
Successful Request 208
Request_Gen_Token_Len 1024
Batch Size 128
Avg_Input_Token_Len 1802.95
Avg_Gen_Token_Len 994.21
Elapse_Time (s) 135.085
Time_to_First_Token_AVG (s) 36.664
Time_to_First_Token_P99 (s) 62.527
Time_per_Output_Token_AVG (s) 0.028
Time_per_Output_Token_P99 (s) 0.045
Latency_P90 (s) 88.988
Latency_P95 (s) 90.888
Latency_P99 (s) 92.339
Latency_AVG (s) 33.051
Token QPS (token/s) 1530.85
Service QPS (req/s) 1.54
awq result:
Successful Request 369
Request_Gen_Token_Len 1024
Batch Size 64
Avg_Input_Token_Len 1726.56
Avg_Gen_Token_Len 952.3
Elapse_Time (s) 212.125
Time_to_First_Token_AVG (s) 8.244
Time_to_First_Token_P99 (s) 29.357
Time_per_Output_Token_AVG (s) 0.029
Time_per_Output_Token_P99 (s) 0.062
Latency_P90 (s) 53.352
Latency_P95 (s) 55.721
Latency_P99 (s) 58.419
Latency_AVG (s) 31.806
Token QPS (token/s) 1656.56
Service QPS (req/s) 1.74
Successful Request 177
Request_Gen_Token_Len 1024
Batch Size 128
Avg_Input_Token_Len 1804.7
Avg_Gen_Token_Len 931.08
Elapse_Time (s) 105.276
Time_to_First_Token_AVG (s) 30.793
Time_to_First_Token_P99 (s) 59.689
Time_per_Output_Token_AVG (s) 0.028
Time_per_Output_Token_P99 (s) 0.072
Latency_P90 (s) 72.126
Latency_P95 (s) 86.212
Latency_P99 (s) 88.854
Latency_AVG (s) 24.425
Token QPS (token/s) 1565.43
Service QPS (req/s) 1.68
build commands:
The text was updated successfully, but these errors were encountered: