Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downtime for SLA-Reporting #15

Closed
icefish-creativ opened this issue Mar 27, 2019 · 14 comments
Closed

Downtime for SLA-Reporting #15

icefish-creativ opened this issue Mar 27, 2019 · 14 comments
Labels
enhancement New feature or request

Comments

@icefish-creativ
Copy link

Uptime is a really great feature.i think it is useful for everybody or rather essential for SLA-Reporting. it would be nice you could set a SLA-Value , so you could also put a monitor on it and see if the service level are violated. And then i could finally throw Nagios/Icinga out the window.

  • a list of downtime per host/services
  • via host/service I can set an SLA like 99,9;99,5 etc.
  • SLA reporting where downtimes are calculated automatically (GUI vor Live and PDF)

cheers

Tim

@andrewvc andrewvc added enhancement New feature or request discussion labels Mar 28, 2019
@andrewvc
Copy link
Contributor

Thanks for posting this Tim. This is a great idea, and honestly not a heavy lift item as well.

I think a next step for this would be to create mocks.

@dov0211 @justinkambic @makwarth what are your thoughts for this feature?

From a priority perspective, this feels lower than central management and alerting, so it's probably a ways down the road for now.

@makwarth
Copy link

Agree this would be a great addition. Thanks for posting, @icefish-creativ.
I wonder how SLA per service would work with Heartbeat. Grouping of monitors?

@dov0211
Copy link

dov0211 commented Mar 28, 2019

We had similar feedback from our SA team, (SLAs & integration within Observability solutions).
I agree with @andrewvc to consider those 2 items straight after Central management and alerting.
several vendors provide rich capabilities in terms of SLAs calculation (Different KPIs, different calendars and working hours, downtimes, and more)
I believe we should start with calculating endpoint availability as a first phase, and think of those in the aspects of reporting and alerting.

@icefish-creativ
Copy link
Author

@makwarth
my pleasure
for example i have 4 Webserver in back of a Loadbalancer , so i need 5 monitors , 4 to Check the Server and 1 over the Loadbalancer with a Application Endpoint Check. I give the Customer a SLA on the Loadbalancer Check. the availability of the server itself is of secondary importance.

Set Downtime on Groups is of course great :-). groups should base on custom fields , for example i added every message the custom fields host.environment(like prod, test),host.role(like web server,mysql) and host.setup(foobar1,foobar2)

@alogishetty
Copy link

Hey Guys, can we get this feature implemented? We are looking for these kind of metrics for defining SLO's and SLA's.

@dwchurch
Copy link

Yes, uptime would be immensely more valuable with this feature.

@TheSecMaven
Copy link

TheSecMaven commented Aug 21, 2019

This is a huge blocker for us in our implementation of uptime. We also are looking for these kind of metrics and think that this product can get far more value by providing these metrics.

@firewallkevin
Copy link

This is a superlatively useful feature that would extend elastic stack's use in IT Operations and analytics.

@andrewvc
Copy link
Contributor

To add to this issue, some of the metadata that's a pre-req for calculating this is here: elastic/beats#13672

I'm thinking we can add this along with this improvement in elastic/kibana#44546 since the timeline calculation gives us that info for free more or less.

It would be great to hear from more people in this thread about what a downtime indicator would be used for.

Would you use it for defining breaches of contract? Purely for internal metrics with less strict guidelines? Something else? Would you use it from multiple geo locations?

@alogishetty
Copy link

alogishetty commented Sep 27, 2019 via email

@andrewvc
Copy link
Contributor

@alogishetty it'd be great to hear more details here about contract breaches and geo locations.

  1. Are you looking for individual statistics per geo location?
  2. How would you define contract breaches? It may be hard or impossible for us to support custom formulas for SLA.

@alogishetty
Copy link

alogishetty commented Sep 28, 2019 via email

@andrewvc
Copy link
Contributor

andrewvc commented Oct 2, 2019

@alogishetty hmmm, how do you track connectivity between data centers? Do you have a heartbeat job in each DC that does nothing but ping the other? Or do all your monitors run once in each DC? Both?

@andrewvc
Copy link
Contributor

Fixed in elastic/kibana#67790 (targeting 7.9.0). If this doesn't resolve anyone in this thread's use cases feel free to open a new issue.

@zube zube bot removed the [zube]: Done label Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants