Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Uptime] Allow logical AND in Monitor Status rule #391

Closed
justinkambic opened this issue Oct 21, 2021 · 9 comments
Closed

[Uptime] Allow logical AND in Monitor Status rule #391

justinkambic opened this issue Oct 21, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@justinkambic
Copy link

Is your feature request related to a problem? Please describe.
We received a request from a user to be able to receive alerts only when their service is down in all locations. Today, if we configure an alert with multiple locations and any of them go down, the rule will become active.

Describe the solution you'd like
We should add an option that allows users to make it so a rule only activates when all of the specified locations are simultaneously unavailable.

Describe alternatives you've considered
N/A

Additional context
There may be some workaround possible with custom query logic, but if we want users to be able to do this it will be worthwhile to add the functionality as part of the standard UI flow.

@justinkambic justinkambic added the enhancement New feature or request label Oct 21, 2021
@sanjaruzic
Copy link

Adding a specific customer use case:

The customer has several heartbeat instances monitoring the same URLs from different locations.
Currently it is not possible to group the alerts coming from different heartbeat instances for the same URL.

The customer wants to get an alert if 1 instance is not reporting.
The SQL equivalent would be
select count(distinct(observer.geo.name)) from 'heartbeat-7* < X

@justinkambic
Copy link
Author

We're going to evaluate putting this request onto our roadmap. Earliest possible target would be 8.1, but at this point we haven't committed to working on it at all.

If it does get added you'll be able to track the board it's on (projects link on the sidebar) and the issue's target version label.

@paulb-elastic
Copy link

@justinkambic the main description (alert when all are down), seems different to #391 (comment) suggesting an alert if one location is down. Can you clarify?

@justinkambic
Copy link
Author

the main description (alert when all are down), seems different to #391 (comment) suggesting an alert if one location is down. Can you clarify?

Per the original forum request:

Having Heartbeat deployed to multiple hosts, I would like to be alerted only when a monitor (e.g. ICMP probe on "example.org") fails on all of them.

Today we will trigger an alert for a rule when any location is down. This is the canonical case, and today we're not accounting for the more esoteric choice of wanting to know only when all are down.

@paulb-elastic
Copy link

Pinging @andrewvc re @justinkambic's comment

@paulb-elastic
Copy link

@drewpost to find out some more about the why for this, for example, to handle the unreliability of ICMP for example with retry capabilities.

@huemac
Copy link

huemac commented Jan 10, 2022

Hi @paulb-elastic
This is useful when we have heartbeat deployed to multiple availability zones (e.g. in Azure) and we only want to be alerted if an endpoint is reported down on ALL the az's.
It may be acceptable for the application that is under monitoring to be inaccessible from some az's. But if all heartbeat deployments are all reporting the endpoint as "Down", then there is a real issue that needs to be acted on.

@kevinnoel-be
Copy link

We have the same use case here. We've heartbeat instances deployed in multiple AZs and we'd need to know when the monitor is down from all observers point of view.
Also, the current Uptime monitor status check only allows for conditions like Matching monitors are down > X times within last Y minutes which is not very useful in this setup as we may have a variable number of heartbeat instances (i.e. at least one, expected two).

@paulb-elastic
Copy link

Captured in elastic/kibana#153571

@paulb-elastic paulb-elastic closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants