-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-10206 config: set up extra systemd options #8593
base: master
Are you sure you want to change the base?
Conversation
Add several options to the systemd unit file of the daos_server to: - increase the scheduling priority over other Linux processes - bind the control plane to core #0 to avoid noises on the engine - reduce the OOM score to kill other processes before the daos_server - dump the content of pmem to core files Signed-off-by: Johann Lombardi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
utils/systemd/daos_server.service
Outdated
@@ -13,12 +13,24 @@ RuntimeDirectoryMode=0755 | |||
ExecStart=/usr/bin/daos_server start | |||
StandardOutput=journal | |||
StandardError=journal | |||
StartLimitBurst=5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation do you want to address with this setting ?? Do I correctly understand that this means we will tolerate up to 5 service restarts within StartLimitIntervalSec/DefaultStartLimitIntervalSec=10s before to decide stopping restart attempts ?? Well, my experience with daos_server service startup failures is that there is something wrong that needs to be investigated, so I would better set StartLimitBurst=1 and raise a little bit StartLimitIntervalSec to something like 120s. But I am sure we will had to schedule a specific meeting about this topic where everybody can have a different idea !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only moved this one around and did not change the value. Not sure who added it. @mjmac any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Well, git blame shows that @dpquigl added that line. It looks like StartLimitIntervalSec
is set to 60s up in the unit block above this, so this is saying that daos_server
can only attempt to restart 5 times within 60s, if I understand correctly.
I do tend to agree that there probably isn't much value in having the service attempt to restart multiple times, as errors which prevent successful startup (bad config, missing hardware, etc) aren't likely to resolve themselves.
@dpquigl: Was there a specific rationale behind choosing that value? Should we just revert to the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, everything seems fine at the exception of this parameter StartLimitIntervalSec
. Is there any feedback on the motivation to choose this value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was set the way it was because without it we were flooding the logs with error messages if DAOS wasn't configured correctly and systemd had it enabled.
Signed-off-by: Johann Lombardi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine for me.
@@ -4,21 +4,40 @@ StartLimitIntervalSec=60 | |||
Wants=network-online.target | |||
After=network-online.target | |||
|
|||
StartLimitIntervalSec=60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks redundant with line #3 in fact
Signed-off-by: Johann Lombardi <[email protected]>
Signed-off-by: Johann Lombardi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider merging latest since it's quite old?
Add several options to the systemd unit file of the daos_server to:
Signed-off-by: Johann Lombardi [email protected]