Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#8166): Update tune.bufsize and remove rsyslog from haproxy image #8170

Merged
merged 15 commits into from
May 10, 2023

Conversation

dianabarsan
Copy link
Member

@dianabarsan dianabarsan commented Apr 7, 2023

Description

According to documentation, when increasing tune.bufsize, maxconn should be decreased by the same factor, otherwise there is a risk of OOM.

values larger than default size will increase memory usage, possibly causing the system to run out of memory.

https://www.haproxy.com/documentation/hapee/latest/onepage/#3.2-tune.bufsize

In our config, we both increased maxconn and tune.bufsize by x 100. This caused the haproxy memory footprint to increase to up to 18GB. This memory is not released when connections are dropped and crashed instances that had a lower amount of available memory than this.

Additionally, this PR removes rsyslog from haproxy image. It was no longer needed, since docker logs are already enough for funnelling haproxy logs, and having the service running prevented the container from restarting when the main haproxy process ran out of memory.
The container automatically restarts now, so setting a default resource limit to the container becomes an option.

#8166

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate
  • Internationalised: All user facing text
  • Backwards compatible: Works with existing data and configuration or includes a migration. Any breaking changes documented in the release notes.

Compose URLs

If Build CI hasn't passed, these may 404:

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@dianabarsan dianabarsan marked this pull request as ready for review April 13, 2023 16:15
@dianabarsan
Copy link
Member Author

I've ran some scalability tests over this new config: https://docs.google.com/spreadsheets/d/1tWlECCmB9k7yXQ4cdS0wKHSQYDAsp0emFjRT-BiiCiY/edit?usp=sharing

I am getting some errors - where haproxy sends 502s to api.
After 4 scalability runs, haproxy uses 3GB of memory. This memory is not released when the process is idle:
image

@dianabarsan
Copy link
Member Author

I'd love some input on this.

@henokgetachew
Copy link
Contributor

henokgetachew commented Apr 14, 2023

This memory is not released when connections are dropped

That's default behavior according to the docs. Do you think we should experiment with the double edged sword of http-server-close?
i.e. This would make HAProxy close the connection after each response, which can help to prevent the accumulation of idle connections and reduce resource usage. However, it's important to note that this can also increase the overhead of establishing new connections for subsequent requests, which can impact performance. How would that affect replication?

@dianabarsan
Copy link
Member Author

I'm experimenting with buffer limit right now, I can try http-server-close as well.
Thanks a lot for the input @henokgetachew !

@dianabarsan
Copy link
Member Author

dianabarsan commented Apr 17, 2023

Ok, it turns out we already use option http-server-close

Below are the results of running repeated scalability tests over two variants of this PR:

  • for the first group, I completely removed the tune.bufsize
  • for the second group, I set the tune.bufsize value to 2x documented default value and added tune.buffers.limit as the same value as maxconn.
Nbr clients Nbr docs Mean (min) Mean (ms) Median (min) Median (ms) Errors Bufsize
               
300 1800 157.7068943 9462413.66 162.93505 9776103 16 default
300 1800 146.9544652 8817267.91 157.2243583 9433461.5 5 default
300 1800 165.3795982 9922775.89 165.3177083 9919062.5 200 default
300 1800 160.7876768 9647260.61 163.1117167 9786703 3 default
               
300 1800 160.7876768 9944662.7 163.1117167 9953405 25 2x default
300 1800 157.1082967 9426497.8 158.6040333 9516242 10 2x default
300 1800 165.7443783 8669043.58 165.8900833 8672980 1 2x default
300 1800 157.1082967 8130034.95 158.6040333 8910909.5 33 2x default

The "errors" are all identical:

2023-04-17 05:33:24 ERROR: Server error: FetchError: invalid json response body at 
http://haproxy:5984/medicss/_design/medic/_view/docs_by_replication_key reason: 
Unexpected token < in JSON at position 0

Memory usage for haproxy after 4 scalability test runs is 3.64GiB.

@dianabarsan
Copy link
Member Author

Looks like memory footprint plateaued at 4.292GiB with

tune.bufsize 32768
tune.buffers.limit 150000

@dianabarsan dianabarsan changed the title fix(#8166): Remove or update tune.bufsize config from haproxy fix(#8166): Update tune.bufsize and remove rsyslog from haproxy image Apr 26, 2023
@dianabarsan
Copy link
Member Author

I think I'm done with this. I've had this live on the scalability setup for a while (weeks?) and have been running scalability tests against it and it's been stable.

Feedback / reviews please!

Copy link
Member

@garethbowen garethbowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've extensively tested this and don't want to rock the boat, but I can't understand why our app would require these numbers, particularly the tune.buffers.limit number. Have I missed something?

haproxy/entrypoint.sh Show resolved Hide resolved
haproxy/default_frontend.cfg Show resolved Hide resolved
Copy link
Member

@garethbowen garethbowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I wish there was a way to work out what the numbers should be but if this works in testing lets run with it, and look at removing haproxy altogether later.

@dianabarsan dianabarsan merged commit 534f126 into master May 10, 2023
@dianabarsan dianabarsan deleted the 8166-new-haproxy branch May 10, 2023 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants