-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix when monitor of systemd resource continues to be pending. #3720
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Hideo,
Thanks for the fix!
@@ -901,6 +901,28 @@ action_complete(svc_action_t * action) | |||
} | |||
} | |||
} | |||
} else if (pcmk__strcase_any_of(cmd->action, PCMK_ACTION_MONITOR, PCMK_ACTION_STATUS, NULL) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case-sensitive comparison would be appropriate (pcmk__str_any_of()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay!
/* For monitors other than follow up monitors, if "use-monitor-pending-timeout" */ | ||
/* is enabled and the pending state continues from the time of the first notification */ | ||
/* until the timeout, it will be treated as a timeout. */ | ||
if (pcmk__str_eq(g_hash_table_lookup(cmd->params, PCMK_XA_USE_MONITOR_PENDING_TIMEOUT), "true", pcmk__str_casei)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this fix makes sense to apply universally -- no new parameter is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right!
Adding parameters will be discontinued.
@@ -901,6 +901,28 @@ action_complete(svc_action_t * action) | |||
} | |||
} | |||
} | |||
} else if (pcmk__strcase_any_of(cmd->action, PCMK_ACTION_MONITOR, PCMK_ACTION_STATUS, NULL) && | |||
(cmd->interval_ms > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think probes would have the same issue. Probes will expect "not running" but could find it pending instead. There could be an argument that a probe should return "running" for pending but I think following the same timeout logic makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked into Probe much.
I'll look into it a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked it.
I think what you mean when you say that Probe can handle the same thing here is something like this, what do you think?
If the status becomes PENDING during Probe, it will not respond immediately, but will continue to monitor at regular intervals until a timeout occurs, and will respond to controld when it gets a status other than PENDING.
If the timeout has passed, it will return PENDING.
In terms of processing, I think it is probably closer to the already existing start/stop follow up monitor than this time.
If I'm not mistaken, I will include this Probe change when I resubmit this PR.
Best Regards,
Hideo Yamauchi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HideoYamauchi ,
Yes, I think that's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kgaillot
All right.
Prepare the pending monitor fixes and the pending probe fixes and submit the PR again.
Many thanks,
Hideo Yamauchi.
crm_warn("If monitor-pending-timeout is set to true and the interval is longer than the monitor timeout, " | ||
"two consecutive monitor deactivations will be treated as an error. " | ||
"We recommend setting a shorter interval to ensure that " | ||
"multiple consecutive deactivations before the timeout are treated as an error."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow what the issue is here. The interval is always counted from the last completion, not the last initiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this is the case when extreme settings such as an interval of 3600s and a timeout of 60s are used.
If the interval is performed at a short interval such as 20s, it can be determined that there is a problem if deactivating is returned several times in a row.
If the setting is 3600s and deactivating is returned the Nth time, the N+1st time will be the second deactivating, so I thought that the number of times might not be enough.
However, on reflection, this warning does not seem to be very meaningless, so I will ultimately delete this warning.
Thank you for your comment. Many thanks, |
Hi All,
If a systemd resource fails and continues to return pending, detection of the failure will be delayed until the recheck-timer, and the service will be stopped until failover occurs.
This issue was confirmed by our user when using Zabbix 6.0 resources with systemd.
To control this, we have added the use-monitor-pending-timeout parameter and made a correction so that if the pending state continues until the monitor timeout, it will be treated as an error.
Please let me know your thoughts on the proposed amendments.
Best Regards,
Hideo Yamauchi.