Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance drop between v0.103.0 and v0.104.0 #340

Closed
LesnyRumcajs opened this issue Feb 20, 2022 · 10 comments · Fixed by #376
Closed

Significant performance drop between v0.103.0 and v0.104.0 #340

LesnyRumcajs opened this issue Feb 20, 2022 · 10 comments · Fixed by #376

Comments

@LesnyRumcajs
Copy link

Hello!
First of all, thanks for for this amazing tool. We are using it in https://github.com/LesnyRumcajs/grpc_bench to benchmark gRPC implementations across different programming languages and technologies.

I'm using ghz as a docker image and noticed a significant drop in performance between tag v0.103.0 and v0.104.0. Had to do some binary search to pinpoint the exact versions. :)

tl;dr

Before v0.104.0, a single ghz container constrained to 1 CPU could generate (with a given payload / benchmark scenario) 6628 req/s to a Rust gRPC service (which would make Rust implementation use ~11% CPU). Right now ghz is able to generate ~5-6x times less load.

long story

Using locally https://hub.docker.com/r/obvionaoe/ghz (it also applies to the one that was introduced to this repo recently).

v.0.103.0

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    6628 |      107.35 ms |      198.05 ms |      203.40 ms |      396.92 ms |   11.25% |     10.95 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
4c97b2f Fri, 18 Feb 2022 15:15:40 +0100 GitHub update scala_fs2 version (#195)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto

v0.104.0

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    1178 |      290.81 ms |      599.03 ms |         1.90 s |         4.30 s |     4.6% |      8.86 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
4c97b2f Fri, 18 Feb 2022 15:15:40 +0100 GitHub update scala_fs2 version (#195)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto

You can look at global results (I believe using the latest docker image from this repo) here: https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-02-18-bench-results

Same benchmark but using some old ghz: https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-02-19-bench-results

In our benchmark case it is a major issue because we can't, even on a decent machine, saturate a single CPU Rust implementation, not to mention multi-CPU scenarios.

It would be nice to find the root cause and perhaps setup some tests (even manual ones, before release) to check if the performance is more or less the same. Unfortunately I'm not versed in Golang so I can't contribute more than that. :(

@LesnyRumcajs LesnyRumcajs changed the title Significant performance dropdown between v0.103.0 and v0.104.0 Significant performance drop between v0.103.0 and v0.104.0 Feb 20, 2022
@bojand
Copy link
Owner

bojand commented Feb 23, 2022

Hello, Thanks for reporting this. That is unfortunate... I'll have to dig deeper... but quickly looking:

The only real change in 0.104.0 was in #317 which adds support for sprig package functions... and if we drill down it seems to iterate over a sizable map of functions... and we do this potentially a lot of times... which isn't ideal.
So I suspect that is the issue.
I think we have a few options I'll have to experiment with for most ideal solution. I'll try and have a fix soon when I get a bit more time.

@bojand
Copy link
Owner

bojand commented Mar 6, 2022

Hello can you please try v0.107.0-pre.0. From some quick rudimentary microbenchmarks... it seems definitely faster. But I am skeptical it would be the same as v0.103.0, but it would be good to see what kind of difference it makes for your use case and context. There are some other options we can try...

@LesnyRumcajs
Copy link
Author

Hello @bojand . I am missing this version from registry. The docker image update job seems to have used the v0.106.1 for the latest tag. Maybe re-running the job would help?

@bojand
Copy link
Owner

bojand commented Mar 7, 2022

Hmm sorry about this... GitHub actions were not set up properly. There should be 0.107.0-pre.3 tag now.

$ docker run ghcr.io/bojand/ghz:0.107.0-pre.3 --version
v0.107.0-pre.3

@LesnyRumcajs
Copy link
Author

Thanks, now it works.
The results I get with this tag are as you expected - significantly faster but still not on the level of v0.103.0

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    1955 |      212.54 ms |      399.35 ms |         1.20 s |         2.30 s |    6.07% |      8.55 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------

@bojand
Copy link
Owner

bojand commented Mar 8, 2022

Do you use template functions and / or call data at all in the tests? From quickly looking at the payload files it doesn't seem like so...? I think an option may be to add --no-template-functions and possibly --no-call-data argument flags. For example --no-template-functions would create call data but not initialize any template functions. --no-call-data would not perform call data functionality at all. I believe this should be able to work around the issue if applicable to your case... Plus in general it may be useful. Specifically --no-call-data may potentially result in even better performance than 0.103.0.

@LesnyRumcajs
Copy link
Author

I don't think we're using any of those. The payload we use is this one which is pretty static. We call it here. That being said, it's good to know such templates exist, it would most likely make the benchmark more realistic - on the other hand, we would need a beast benchmark machine.

@johnoloughlin
Copy link

johnoloughlin commented Oct 19, 2022

I'm also seeing the performance drop issue between 0.103 and 0.109

@bojand
Copy link
Owner

bojand commented Nov 7, 2022

Hello, can you please try v0.111.0. It adds two new options --disable-template-functions and --disable-template-data. If you do not need any template functions use --disable-template-functions, if you do not need any call template data at all you can specify --disable-template-data to skip all template actions. This should hopefully result in performance improvements. We should be able to automatically determine the appropriate behavior, but I thought it would be good to make these options explicit for now to see if it helps this issue.

@LesnyRumcajs
Copy link
Author

@bojand Thanks, it helped.

without disabling template flags

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |    4600 |       89.96 ms |      196.69 ms |      495.78 ms |      898.87 ms |    6.87% |     14.25 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark Execution Parameters:
9f17858 Thu, 23 Feb 2023 16:10:49 +0100 GitHub Bump golang.org/x/text from 0.3.7 to 0.3.8 in /go_connect_bench (#331)
- GRPC_BENCHMARK_DURATION=20s
- GRPC_BENCHMARK_WARMUP=5s
- GRPC_SERVER_CPUS=1
- GRPC_SERVER_RAM=512m
- GRPC_CLIENT_CONNECTIONS=50
- GRPC_CLIENT_CONCURRENCY=1000
- GRPC_CLIENT_QPS=0
- GRPC_CLIENT_CPUS=1
- GRPC_REQUEST_SCENARIO=complex_proto
All done.

with disabling template flags

-----------------------------------------------------------------------------------------------------------------------------------------
| name                        |   req/s |   avg. latency |        90 % in |        95 % in |        99 % in | avg. cpu |   avg. memory |
-----------------------------------------------------------------------------------------------------------------------------------------
| rust_tonic_st               |   17049 |       51.14 ms |       97.80 ms |      101.24 ms |      191.59 ms |   17.29% |      20.8 MiB |
-----------------------------------------------------------------------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants