Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(serf) override serf probe interval and timeout w/ environment variables #102

Closed
wants to merge 2 commits into from
Closed

feat(serf) override serf probe interval and timeout w/ environment variables #102

wants to merge 2 commits into from

Conversation

akomelj
Copy link

@akomelj akomelj commented Dec 18, 2019

This pull request implements an optional mechanism for overriding serf probe interval (default: 1 s) and timeout (default: 0.5 s) via environment variables AGENT_PROBE_INTERVAL and AGENT_PROBE_TIMEOUT.

A little background: I'm running Docker Swarm workers on a network of Raspberry Pis scattered all around the site. Network connectivity to the Pis is reliable enough for a swarm with longer dispatcher-heartbeat setting to survive regular network slowdowns, but Portainer Agent, in my case, relies too heavily on refuted suspect messages to keep the other agents connected. Default timeout for receiving acknowledgments from peers is simply too low for such network (bad Wi-fi signal reception + underpowered hardware). Moreover, sometimes even refutes get lost and fall-back TCP ping mechanism does not wait for successful TCP retransmits before deciding the peer is dead. As a consequence, I'm experiencing all the symptoms described in portainer/portainer#2535.

I tracked the instability issues to default (sane) LAN configuration of serf probe in Agent, namely the probe interval and timeout. By running a custom build of Agent where I replaced the Agent's default settings with serf's WAN presets, I got rid of endpoint instability issues completely and my agent logs are much cleaner now. I'm therefore submitting this pull request where probe interval and timeout can be overridden by two environment variables - it may come in handy to people running swarm on poor networks.

@deviantony deviantony self-requested a review January 8, 2020 19:55
@deviantony
Copy link
Member

Thanks for this contribution @akomelj ! I'll do a review of this real soon.

@deviantony
Copy link
Member

Closing in favor of #241

Super sorry for giving you no update on this topic @akomelj, this is quite a good contribution and should have been merged earlier.

Sorry and thanks again !

@deviantony deviantony closed this Feb 15, 2022
portainer-bot bot pushed a commit that referenced this pull request Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants