-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Service maps health indicators: Indicate alert-based health status on Service nodes #64144
Comments
Pinging @elastic/apm-ui (Team:apm) |
Blocked by #70169. |
Our current thinking on making some forward movement on this, is that this is likely do-able using the existing alertClient While we'd like to gate all access to the event log through it's plugin, for security purposes, the timeframe of getting that in is ... not near. In lieu of a new event log API to fulfill this requirement, accessing the event log indices directly is the next best option. The event log indices are managed by ILM and available under an elasticsearch alias, which we currently don't publish, so we'll need to add an API to the event log to return this in it's plugin The event log indices are based on ECS with some extensions for Kibana, and so hopefully the shape is somewhat familiar. I mentioned "security purposes" above - the event log APIs currently require you to pass in saved object type/ids when searching, and the implementation ensures you have read access to those saved object type/ids before doing the search. So, I'm suggesting bypassing that, given the suggestion of doing a query on the event log indices directly. However, since you do first need to do a |
@pmuellr Thanks Patrick for picking this up - it's very exciting to see that we might be able to provide this to our users sooner rather than later. I just wanted to respond to the following question;
Indeed - currently alerts are created and tied to one service. I'm not sure about this, but I can certainly imagine a time when we'd want to allow for alerts to be tied not to a service, but be able to simply alert on a service instance threshold violation or error instead of having to set individual thresholds per service or even policies for specific service environments. @nehaduggal perhaps you can clarify what's our thinking around alerts in the near- and long-term? Not sure if this changes anything for you at this time. |
Is this still blocked given capabilities introduced via RAC efforts? |
This will be unblocked by rac. Given we don't have an exact date for the release of rac I suggest we leave it blocked for now. |
@sqren Given this is being delivered by RAC, can I remove this from the Alerting team's dependencies list? We're trying to groom the backlog of work and figuring out what is being blocked by us is important :) |
@gmmorris sure 👍 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think this can be unblocked now @sqren 🤔 |
@sqren I'm going to move this to the design board since it's been 2+ years since we looked at the design for this. Cc @alex-fedotyev |
Design issue: elastic/apm#218
Related issue: #63574
Summary
Part of the health indicators in Service maps is to update the health status of a service if there are any active threshold alerts for the service in the selected time range. We assume that if there's been a threshold violation on the service, this is automatically a bad scenario for the user, and therefore we'll treat it as a critical health status.
Solution
If there are any active APM-type (duration or error rate) threshold alerts for the service within the selected time range, override the current anomaly score health status indication (#63574) and show the service as a critical state (red outline).
The overall service map will look something like this
As a later iteration we will most likely have different severity levels in the alerts, which means that some alerts might not be critical but simply warnings, in which case we can choose to only show the warning indication (yellow state).
The text was updated successfully, but these errors were encountered: