Skip to content
This repository has been archived by the owner on Oct 6, 2023. It is now read-only.

Do not compute flapping state when host / parent is not up #192

Closed
UrBnW opened this issue Jul 1, 2019 · 8 comments
Closed

Do not compute flapping state when host / parent is not up #192

UrBnW opened this issue Jul 1, 2019 · 8 comments

Comments

@UrBnW
Copy link
Contributor

UrBnW commented Jul 1, 2019

Hello,

A service is considered as flapping after its state has changed too many times among the last (21) checks. Perfect.

Let's now take a service attached to a host which is itself flapping (or at least changing state regularly).
The service will then frequently switch from OK to UNKNOWN (or any other NOK) state, to OK again etc...
Which will finally bring it to a flapping state, leading to a lot of unwanted and wrong flapping notifications, making flapping detection counter-productive...

According to me, service state change should then not be taken into account for flapping computation when related host is not up.
Unfortunately, sounds like this is not the case for now.

Could we then think about implementing this rule ?

In addition, note that we also receive the service FLAPPING notification even if the related host is not up, whereas other notifications types are muted when host is not up. Anyway this should not occur anymore if we apply the rule above.

Many thanks 👍

Edit : see #524 for a fix.

@bouda1
Copy link
Contributor

bouda1 commented Jul 19, 2019

There are not much people using flapping so it is difficult to have an idea about that.
The changes to have this behaviour are not so easy and the second point is also that we do still try to fix stability problems in the new engine.
I don't want to promise you that we will fix it and be overwhelmed afterwards and not do it.
So, let's see. We keep your ticket open for now and let's do our best :-)

@UrBnW
Copy link
Contributor Author

UrBnW commented Jul 25, 2019

There are not much people using flapping

Really useful when a service moves up and down around the threshold, to avoid notification spamming.

The changes to have this behaviour are not so easy

An idea perhaps, when the engine checks a service which returns a non-OK status, it also checks / knows the related host status, as it will not notify if the host is non-UP.

I may then be tempted to say that the host status is known and that it may then be feasible to avoid storing in the last (21) service status used for flapping computation a non-OK status when the related host is non-UP.

Just my 2 cents :) ...

@UrBnW
Copy link
Contributor Author

UrBnW commented Oct 21, 2019

Something like :

if (hst->get_current_state() != host::state_up) {
    update_history = false;
}

Here ?

centreon-engine/src/service.cc

Lines 1952 to 1953 in 2a0f1fb

/* should we update state history for this state? */
if (update_history) {

We would then be sure not to modify the flapping status of a service at all while the host is down.

@lpinsivy
Copy link
Contributor

Hi @cpbn,

The calculation of the flapping of a service is independent of the host as you explained.

More generally, what would be the principle of notification of services if the host is in a flapping state? To disable the notification of all services of the host?

@UrBnW
Copy link
Contributor Author

UrBnW commented Nov 26, 2019

Hi @lpinsivy, thank you for asking.

Simply notify as usual :

  • if service is !OK while host is UP, notify ;
  • if service is !OK while host is !UP, don't notify.

If host becomes !UP, then service will change to !OK.
This service state change will then occur because of the host, not because of the service itself.
So, simply don't compute service flapping status when host is !UP.
I think my piece of code above should do the trick 😉

Thank you again 👍

@UrBnW
Copy link
Contributor Author

UrBnW commented Jan 13, 2020

Note that we have the same issue with (parent / child) hosts themselves.
Here's an example :

pc

fw is the parent host, 3cx is the child host.
Only UP and DOWN notifications are enabled, UNREACHABLE notifications are disabled.

fw starts to flap.
So, as a result, 3cx also flaps.
We receive notifications for the parent host, which are expected.
But we finally receive an unexpected UNREACHABLE notification for the child host, which is in reality a FLAPPING one.
This is due to the fact that the child flapping status is computed even if the parent is !UP.

So, as proposed above for services, we should not compute the child host flapping status when the parent host is !UP.
This would prevent such situations.

Thank you again 👍

@UrBnW UrBnW changed the title Flapping, do not compute service state when host is not up Do not compute flapping state when host / parent is not up May 5, 2020
@UrBnW
Copy link
Contributor Author

UrBnW commented Jun 28, 2021

See #524 for a fix 👍

@UrBnW
Copy link
Contributor Author

UrBnW commented Oct 27, 2021

Closing this, as it should be solved by #557.
Will re-open if needed.
Thank you again 👍

@UrBnW UrBnW closed this as completed Oct 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants