Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[systemd] decrease start limit and interval #25474

Merged
merged 1 commit into from
Nov 15, 2018

Conversation

jbudz
Copy link
Member

@jbudz jbudz commented Nov 9, 2018

By default systemd will allow restarting a service 5 times in 10 seconds before entering a failed state. Depending on how Kibana is configured we may not hit this limit, and the service will actively try to restart forever.

This modifies our settings to restart 3 times within a 30 second period before entering a failed state.

Closes #10895

Testing

  1. dpkg -i kibana.deb, yum install kibana.rpm etc
  2. /etc/kibana/kibana.yml -> invalid setting, e.g. kibana.pid: var/run/kibana.pid (permissions
  3. service restarts three times and then stops, can be checked with systemctl status kibana.service

@jbudz jbudz added review Team:Operations Team label for Operations Team labels Nov 9, 2018
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

Copy link
Member

@mistic mistic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@jbudz jbudz merged commit 602f6f9 into elastic:master Nov 15, 2018
@bluikko
Copy link

bluikko commented Mar 15, 2019

These changes should be in the [Service] section and not in the [Unit] section:

systemd[1]: [/etc/systemd/system/kibana.service:3] Unknown lvalue 'StartLimitIntervalSec' in section 'Unit'
systemd[1]: [/etc/systemd/system/kibana.service:4] Unknown lvalue 'StartLimitBurst' in section 'Unit'

On EL7.
It does not start due to the errors and after moving them to [Service] section it works normally.

@jbudz
Copy link
Member Author

jbudz commented Mar 15, 2019

Most likely related to systemd versioning, it was moved from the Service section to the Unit section in 229. Can you share your version? /usr/lib/systemd/systemd --version

@jbudz
Copy link
Member Author

jbudz commented Mar 15, 2019

I opened #33326

@bluikko
Copy link

bluikko commented Mar 16, 2019

You are probably correct, the version is 219. Everyone on EL7 should have this same problem, in this version:

  • StartLimitBurst is in Service section.
  • StartLimitIntervalSec is named StartLimitInterval.

It is very unfortunate that such a critical POS has unstable interfaces on top of other problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:Operations Team label for Operations Team v6.6.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants