Skip to content
This repository has been archived by the owner on Oct 6, 2023. It is now read-only.

Notifications: services notify when host down #286

Closed
UrBnW opened this issue Nov 5, 2019 · 17 comments
Closed

Notifications: services notify when host down #286

UrBnW opened this issue Nov 5, 2019 · 17 comments
Labels

Comments

@UrBnW
Copy link
Contributor

UrBnW commented Nov 5, 2019

Hi,

I'm facing this strange issue :

68206820-f4af0480-ffcd-11e9-9fd7-749b5443ca64

As you can see, sys-backup service notifies even if its host is down.
In comparison, sys-uptime service does not notify.
Here, sys-backup should then not have notified.

Perhaps a bug somewhere ?

Edit : rootcause here below : #286 (comment)

Many thanks 👍

@SylvestreG
Copy link
Contributor

I think the service has been notified because the host was DOWN in soft state.

If the host was DOWN in hard state, the service will not be notified.

@UrBnW
Copy link
Contributor Author

UrBnW commented Nov 6, 2019

Thank you @SylvestreG.
I'm not sure about your answer : if service notifies when host is in soft down state, then sys-uptime should have notified, but it did not 😕

@bouda1
Copy link
Contributor

bouda1 commented Nov 7, 2019 via email

@UrBnW
Copy link
Contributor Author

UrBnW commented Nov 18, 2019

OK, issue reproduced, here it is :

Screen Shot 2019-11-18 at 22 39 39

And related log lines :

[1574096096] [25958] Centreon Engine 19.10.6 starting ... (PID=25958)
[1574096096] [25958] Local time is Mon Nov 18 17:54:56 2019
[1574096096] [25958] LOG VERSION: 2.0
[1574096096] [25958] Calculating next valid notification time...
[1574096096] [25958] Default interval: 0
[1574096096] [25958] Interval used for calculating next valid notification time: 0
[1574096096] [25958] Event broker module '/usr/lib64/centreon-engine/externalcmd.so' initialized successfully
[1574096096] [25958] Centreon Broker: log applier: applying 1 logging objects
[1574096096] [25958] Event broker module '/usr/lib64/nagios/cbmod.so' initialized successfully
[1574096096] [25958] INITIAL HOST STATE: fw01;UP;HARD;1;OK - fw01 rta 12.929ms lost 0%
[1574096096] [25958] INITIAL SERVICE STATE: fw01;sys-cpu;OK;HARD;1;OK: 4 CPU(s) average usage is 0.00 %
[1574096096] [25958] INITIAL SERVICE STATE: fw01;sys-uptime;OK;HARD;1;OK: System uptime is: 27d 21h 51m 24s
[1574096096] [25958] INITIAL SERVICE STATE: fw01;sys-interfaces WAN;OK;HARD;1;OK: All interfaces are ok
[1574096096] [25958] INITIAL SERVICE STATE: fw01;sys-backup;WARNING;HARD;3;WARNING: Backup age: 437248.9h
[1574112236] [25958] SERVICE ALERT: fw01;sys-interfaces WAN;UNKNOWN;SOFT;1;UNKNOWN: SNMP Session : unable to create
[1574112241] [25958] HOST ALERT: fw01;DOWN;SOFT;1;CRITICAL - fw01: rta nan, lost 100%
[1574112241] [25958] SERVICE ALERT: fw01;sys-cpu;UNKNOWN;HARD;1;UNKNOWN: SNMP Session : unable to create
[1574112256] [25958] SERVICE ALERT: fw01;sys-uptime;UNKNOWN;HARD;1;UNKNOWN: SNMP Session : unable to create
[1574112256] [25958] HOST ALERT: fw01;DOWN;SOFT;2;CRITICAL - fw01: rta nan, lost 100%
[1574112271] [25958] SERVICE ALERT: fw01;sys-backup;UNKNOWN;HARD;3;UNKNOWN: SNMP Session : unable to create
[1574112271] [25958] Processed notification command: /bin/sh -c '...'
[1574112271] [25958] SERVICE NOTIFICATION: Admin;fw01;sys-backup;UNKNOWN;service-notify-by-email;UNKNOWN: SNMP Session : unable to create
[1574112271] [25958] Processed notification command: /bin/sh -c '...'
[1574112271] [25958] SERVICE NOTIFICATION: Tools;fw01;sys-backup;UNKNOWN;service-notify-by-email;UNKNOWN: SNMP Session : unable to create
[1574112271] [25958] HOST ALERT: fw01;DOWN;SOFT;3;CRITICAL - fw01: rta nan, lost 100%
[1574112286] [25958] SERVICE ALERT: fw01;sys-interfaces WAN;OK;SOFT;2;OK: All interfaces are ok
[1574112287] [25958] HOST ALERT: fw01;UP;SOFT;4;OK - fw01 rta 12.736ms lost 0%
[1574112302] [25958] SERVICE ALERT: fw01;sys-cpu;OK;HARD;1;OK: 4 CPU(s) average usage is 0.00 %
[1574112429] [25958] SERVICE ALERT: fw01;sys-uptime;OK;HARD;1;OK: System uptime is: 28d 2h 26m 2s
[1574112444] [25958] SERVICE ALERT: fw01;sys-backup;WARNING;HARD;3;WARNING: Backup age: 437253.5h

@lpinsivy
Copy link
Contributor

@cpbn this is not the correct debug file. It should be centengine.debug

Regards

@UrBnW
Copy link
Contributor Author

UrBnW commented Nov 26, 2019

@lpinsivy it is centengine.debug, in /var/log/centreon-engine/.
I just cross-checked again.
Of course feel free if you need mode log / info etc...
For sure this issue is rather weird 😕
Thank you.

@lpinsivy
Copy link
Contributor

Hi @cpbn

I'm not sure that checking any Notification filters (checkboxes) for sys-uptime = all option.
For me is look like NONE

@UrBnW
Copy link
Contributor Author

UrBnW commented Dec 20, 2019

Hi @lpinsivy,
It's all, as per the documentation, and this service correctly notifies.
I was, at time of the first post, not sure about the root cause.
However, I have found it, as per my post just below : #286 (comment)
You should then manage to easily reproduce this issue.
Thank you again !

@UrBnW
Copy link
Contributor Author

UrBnW commented Dec 20, 2019

OK, I've found the root cause.
The concerned service for which we receive unexpected UNKNOWN alerts is already in a HARD state before switching to UNKNOWN. This is the key.
Well, to be correct, the key is that these services then have tries == max_check_attempts.

It's easily reproductible :
Let's assume a service in HARD WARNING state (tries == max_check_attempts).
You've already received a WARNING notification for this.
The related host becomes !UP (tested with DOWN, not with UNREACHABLE, should be the same).
The previous service then switches from WARNING to UNKNOWN state.
And here, if host is SOFT !UP, you'll receive an unexpected UNKNOWN notification (notification is not sent if host is already in HARD state).

I say unexpected, because services which have tries != max_check_attempts (for example generally OK services) won't get their tries increased when host is SOFT !UP. So these services won't never notify about their state change.
Whereas, then, a service which already has tries == max_check_attempts will.

This issue can also be reproduced with 2 OK services, first one with max_check_attempts == 1, second one with max_check_attempts == 2.
First one will notify when host is SOFT !UP, second will never (whatever number of retries done).

There may then be an issue in which order checks are done in engine before notification is sent.

Could this then be corrected please ?

Thank you 👍

@UrBnW
Copy link
Contributor Author

UrBnW commented Jun 17, 2021

Hi @bouda1, any progress on this annoying bug ?
Many thanks 👍

@bouda1
Copy link
Contributor

bouda1 commented Jun 17, 2021

Hi @UrBnW We are going to work on it very soon!! 💯

@omercier
Copy link
Contributor

Hi @UrBnW,
Have you tried setting Soft Service Dependencies Option to "yes"?
image
Here is what it is supposed to do:
This option determines whether or not Monitoring Engine will use soft state information when checking host and service dependencies. Normally Monitoring Engine will only use the latest hard host or service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard state type), enable this option.
In my understanding it should fit your need, do you confirm?

@UrBnW
Copy link
Contributor Author

UrBnW commented Jul 13, 2021

Hi @omercier,
Thank you for your proposal.
Just tested (again ? pretty sure I already did, anyway) but it does not help, issue still occurs whatever this option is set to.
So issue still present in 21.04.4 (engine 21.04.3, broker 21.04.2)
See just above for the detailed root cause explanation : #286 (comment)
There must be a small bug somewhere in the tries / state / notification path :)
Thank you again 👍

@omercier
Copy link
Contributor

Ok, thank you for your answer.
The notification should not be sent with a SOFT DOWN host if this parameter is set to "yes", but they are.
The issue is in our backlog and should be fixed in the next weeks or months.

@UrBnW
Copy link
Contributor Author

UrBnW commented Oct 27, 2021

Hi @omercier,
Do you perhaps have some news regarding this ?
Many thanks 👍

@omercier
Copy link
Contributor

omercier commented Nov 4, 2021

Hi Benjamin,
Yes, this issue has been fixed for centreon-engine 20.10.7 and 21.04.4 (and of course in 21.10.0).
Ref MON-10642

@omercier omercier closed this as completed Nov 4, 2021
@UrBnW
Copy link
Contributor Author

UrBnW commented Nov 4, 2021

Hi Olivier,
Pretty good news, many thanks 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants