Significant performance drop between v0.103.0 and v0.104.0 #340

LesnyRumcajs · 2022-02-20T15:07:42Z

Hello!
First of all, thanks for for this amazing tool. We are using it in https://github.com/LesnyRumcajs/grpc_bench to benchmark gRPC implementations across different programming languages and technologies.

I'm using ghz as a docker image and noticed a significant drop in performance between tag v0.103.0 and v0.104.0. Had to do some binary search to pinpoint the exact versions. :)

tl;dr

Before v0.104.0, a single ghz container constrained to 1 CPU could generate (with a given payload / benchmark scenario) 6628 req/s to a Rust gRPC service (which would make Rust implementation use ~11% CPU). Right now ghz is able to generate ~5-6x times less load.

long story

Using locally https://hub.docker.com/r/obvionaoe/ghz (it also applies to the one that was introduced to this repo recently).

v.0.103.0

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    6628 |      107.35 ms |      198.05 ms |      203.40 ms |      396.92 ms |   11.25% |     10.95 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
4c97b2f Fri, 18 Feb 2022 15:15:40 +0100 GitHub update scala_fs2 version (#195)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto

v0.104.0

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    1178 |      290.81 ms |      599.03 ms |         1.90 s |         4.30 s |     4.6% |      8.86 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
4c97b2f Fri, 18 Feb 2022 15:15:40 +0100 GitHub update scala_fs2 version (#195)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto

You can look at global results (I believe using the latest docker image from this repo) here: https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-02-18-bench-results

Same benchmark but using some old ghz: https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-02-19-bench-results

In our benchmark case it is a major issue because we can't, even on a decent machine, saturate a single CPU Rust implementation, not to mention multi-CPU scenarios.

It would be nice to find the root cause and perhaps setup some tests (even manual ones, before release) to check if the performance is more or less the same. Unfortunately I'm not versed in Golang so I can't contribute more than that. :(

The text was updated successfully, but these errors were encountered:

bojand · 2022-02-23T14:11:15Z

Hello, Thanks for reporting this. That is unfortunate... I'll have to dig deeper... but quickly looking:

The only real change in 0.104.0 was in #317 which adds support for sprig package functions... and if we drill down it seems to iterate over a sizable map of functions... and we do this potentially a lot of times... which isn't ideal.
So I suspect that is the issue.
I think we have a few options I'll have to experiment with for most ideal solution. I'll try and have a fix soon when I get a bit more time.

bojand · 2022-03-06T01:09:20Z

Hello can you please try v0.107.0-pre.0. From some quick rudimentary microbenchmarks... it seems definitely faster. But I am skeptical it would be the same as v0.103.0, but it would be good to see what kind of difference it makes for your use case and context. There are some other options we can try...

LesnyRumcajs · 2022-03-06T09:51:09Z

Hello @bojand . I am missing this version from registry. The docker image update job seems to have used the v0.106.1 for the latest tag. Maybe re-running the job would help?

bojand · 2022-03-07T00:35:46Z

Hmm sorry about this... GitHub actions were not set up properly. There should be 0.107.0-pre.3 tag now.

$ docker run ghcr.io/bojand/ghz:0.107.0-pre.3 --version
v0.107.0-pre.3

LesnyRumcajs · 2022-03-07T08:43:40Z

Thanks, now it works.
The results I get with this tag are as you expected - significantly faster but still not on the level of v0.103.0

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    1955 |      212.54 ms |      399.35 ms |         1.20 s |         2.30 s |    6.07% |      8.55 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------

bojand · 2022-03-08T01:51:16Z

Do you use template functions and / or call data at all in the tests? From quickly looking at the payload files it doesn't seem like so...? I think an option may be to add --no-template-functions and possibly --no-call-data argument flags. For example --no-template-functions would create call data but not initialize any template functions. --no-call-data would not perform call data functionality at all. I believe this should be able to work around the issue if applicable to your case... Plus in general it may be useful. Specifically --no-call-data may potentially result in even better performance than 0.103.0.

LesnyRumcajs · 2022-03-10T08:25:15Z

I don't think we're using any of those. The payload we use is this one which is pretty static. We call it here. That being said, it's good to know such templates exist, it would most likely make the benchmark more realistic - on the other hand, we would need a beast benchmark machine.

johnoloughlin · 2022-10-19T09:27:41Z

I'm also seeing the performance drop issue between 0.103 and 0.109

bojand · 2022-11-07T02:01:20Z

Hello, can you please try v0.111.0. It adds two new options --disable-template-functions and --disable-template-data. If you do not need any template functions use --disable-template-functions, if you do not need any call template data at all you can specify --disable-template-data to skip all template actions. This should hopefully result in performance improvements. We should be able to automatically determine the appropriate behavior, but I thought it would be good to make these options explicit for now to see if it helps this issue.

LesnyRumcajs · 2023-03-05T13:15:51Z

@bojand Thanks, it helped.

without disabling template flags

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    4600 |       89.96 ms |      196.69 ms |      495.78 ms |      898.87 ms |    6.87% |     14.25 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
9f17858 Thu, 23 Feb 2023 16:10:49 +0100 GitHub Bump golang.org/x/text from 0.3.7 to 0.3.8 in /go_connect_bench (#331)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto
All done.

with disabling template flags

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |   17049 |       51.14 ms |       97.80 ms |      101.24 ms |      191.59 ms |   17.29% |      20.8 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------

LesnyRumcajs mentioned this issue Feb 20, 2022

use old ghz LesnyRumcajs/grpc_bench#199

Merged

LesnyRumcajs changed the title ~~Significant performance dropdown between v0.103.0 and v0.104.0~~ Significant performance drop between v0.103.0 and v0.104.0 Feb 20, 2022

bojand mentioned this issue Nov 7, 2022

add disable-template-funcs and disable-template-data option #376

Merged

bojand closed this as completed in #376 Nov 7, 2022

LesnyRumcajs mentioned this issue Mar 5, 2023

switch back to bojand/ghz image LesnyRumcajs/grpc_bench#333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant performance drop between v0.103.0 and v0.104.0 #340

Significant performance drop between v0.103.0 and v0.104.0 #340

LesnyRumcajs commented Feb 20, 2022

bojand commented Feb 23, 2022

bojand commented Mar 6, 2022

LesnyRumcajs commented Mar 6, 2022

bojand commented Mar 7, 2022

LesnyRumcajs commented Mar 7, 2022

bojand commented Mar 8, 2022

LesnyRumcajs commented Mar 10, 2022

johnoloughlin commented Oct 19, 2022 •

edited

Loading

bojand commented Nov 7, 2022

LesnyRumcajs commented Mar 5, 2023

Significant performance drop between v0.103.0 and v0.104.0 #340

Significant performance drop between v0.103.0 and v0.104.0 #340

Comments

LesnyRumcajs commented Feb 20, 2022

tl;dr

long story

bojand commented Feb 23, 2022

bojand commented Mar 6, 2022

LesnyRumcajs commented Mar 6, 2022

bojand commented Mar 7, 2022

LesnyRumcajs commented Mar 7, 2022

bojand commented Mar 8, 2022

LesnyRumcajs commented Mar 10, 2022

johnoloughlin commented Oct 19, 2022 • edited Loading

bojand commented Nov 7, 2022

LesnyRumcajs commented Mar 5, 2023

johnoloughlin commented Oct 19, 2022 •

edited

Loading