Skip to content
This repository has been archived by the owner on Oct 6, 2023. It is now read-only.

Flapping, notifications consistency #159

Closed
UrBnW opened this issue Jun 11, 2019 · 6 comments
Closed

Flapping, notifications consistency #159

UrBnW opened this issue Jun 11, 2019 · 6 comments
Assignees

Comments

@UrBnW
Copy link
Contributor

UrBnW commented Jun 11, 2019

Hi,

Seems like notifications sent after a service (or a host) stopped flapping are not really consistent, depending on the service status while it flaps.

Following are 4 tested (reproducible) cases.

Case 1, service starts flapping as critical, and remains critical :

service OK
service CRITICAL - notification CRITICAL
service OK       - notification OK
service CRITICAL - notification CRITICAL
service OK       - notification OK
service CRITICAL - notification FLAPPINGSTART
service CRITICAL
service CRITICAL - notification FLAPPINGSTOP ($SERVICESTATE$ = CRITICAL)
                   notification CRITICAL

Case 2, service starts flapping as OK, and switches to critical :

service OK
service CRITICAL - notification CRITICAL
service OK       - notification OK
service CRITICAL - notification CRITICAL
service OK       - notification FLAPPINGSTART
service CRITICAL
service CRITICAL - notification FLAPPINGSTOP ($SERVICESTATE$ = CRITICAL)
                   notification CRITICAL

Case 3, service starts flapping as critical and switches to OK :

service OK
service CRITICAL - notification CRITICAL
service OK       - notification OK
service CRITICAL - notification CRITICAL
service OK       - notification FLAPPINGSTART
service OK
service OK       - notification FLAPPINGSTOP ($SERVICESTATE$ = OK)

Case 4, service starts flapping as OK, and remains OK :

service OK
service CRITICAL - notification CRITICAL
service OK       - notification OK
service CRITICAL - notification CRITICAL
service OK       - notification OK
service CRITICAL - notification FLAPPINGSTART
service OK
service OK       - notification FLAPPINGSTOP ($SERVICESTATE$ = OK)

I would have expected to receive a second notification (OK) in case 3 and 4.
To be consistent with case 1 and 2, where 2 notifications are sent when service stops flapping.

Or I would have expected the second notification in case 1 and 2 not to be sent.
To be consistent with case 3 and 4.

Note that I clearly prefer the second solution.
According to me, there's no need to send a second notification, as the FLAPPINGSTOP notification already contains the $SERVICESTATE$, and can be processed accordingly, thus avoiding notification "spamming" (which in addition is the flapping detection goal).

I first tried to workaround the dual notification behavior by simply ignoring the FLAPPINGSTOP notification, but then faced cases 3 and 4 where no second notification is sent.

So, as no notification is sent after FLAPPINGSTART, could we then think about disabling the second notification which could be sent right after FLAPPINGSTOP, to be consistent ?

Thank you 👍

@bouda1
Copy link
Contributor

bouda1 commented Jun 12, 2019

Hi @cpbn ,

If you take a look at the branch refacto-engine, you will notice many changes in centreon-engine.

One of the goals is to improve notifications. So I hope this issue will be solve soon.
Cheers.

@UrBnW
Copy link
Contributor Author

UrBnW commented Jun 13, 2019

Hi @bouda1,
Thank you for your answer.
I'll then test this with the new engine once available, and report.
Of course if you are able to test the different flapping notification cases before releasing, then it's perfect :)
Thank you 👍

@lpinsivy
Copy link
Contributor

Hi @cpbn ,

Did you retry your tests with the latest version?
If this issue persist, I agree to do not send notification of a status after end of flapping if notification have been sent before.

@UrBnW
Copy link
Contributor Author

UrBnW commented Nov 26, 2019

Thank you again @lpinsivy.
Yes, last 19.10.x versions of the engine still suffer from this issue (last line of my first post updated accordingly 😉).
Note that I shared many details by mail with @Sims24 & @bouda1, 21-22/10/2019.

@UrBnW
Copy link
Contributor Author

UrBnW commented Aug 9, 2021

Issue still present in 21.04.4 (engine 21.04.3, broker 21.04.2).

We can safely ignore / forget cases 2 and 4 which will now be impossible thanks to #523.
But in case 1, second notification just after the FLAPPINGSTOP one should not be sent.

@omercier
Copy link
Contributor

Hi @UrBnW ,
As we discussed, I will close this issue since some of the scenarios you mentioned here will be fixed by your other PRs (#522, #523 and #524) and your issue #286 which we are about to fix.
Feel free to reopen it later ;-)
Regards,
Olivier

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants