Address watchdogd service problem with generic script > 1s runtime #39 #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit addresses a problem related to the watchdogd service. When trying to
activate a generic monitoring script (in this example it's /usr/sbin/my-script.sh) with a runtime exceeding one second,
it triggers an unintended system reboot.
Our configuration is as follows:
generic {
enabled = true
interval = 60
timeout = 20
warning = 1
critical = 10
monitor-script = "/usr/sbin/my-script.sh"
}
The error message we're encountering reads as below (even though my-script returns 0):
Upon further investigation, it was determined that the problem arises
from the fact that 'gs->script_runtime' is measured in milliseconds,
while 'gs->script_runtime_max' is maintained in seconds, as indicated by the source code here: link to source code. This commit rectifies the issue.
Unit test results
With the fix in this PR there is no failure in watchdog service and it works as expected, please see below the traces.