-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to get all active instances from Observability consumers #70169
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Hello, APM service maps also needs this capability as well for 7.9. We need to be able to show display health indicators or all services in the service map which have active alerts violations. Right now, we can only get it to work by calling |
@mikecote I see you've added this to "Long Term". This is something we hope to be able to have available in 7.10. Is that possible? |
@sqren I went over the recording of the triage session we had for this issue. I think we needed more clarifications on if this issue was still needed or if your requirements have changed based on the scope adjustment the homepage team made for 7.9 / 7.10. We placed it with the bulk APIs story (long term) and had an approach we believe could work for you without waiting on this API (some email thread from a few weeks ago). @pmuellr can help on this. The approach that could work for now is to use the alert find API to get all observability related alerts (filter by alert type and/or consumer) and then use the task manager's fetch API for the alert's We can always revisit and prioritize this issue no problem, probably in the scope of 7.11 once our work for GA is complete. |
From an other thread: I believe the workaround that has been suggested was similar to the approach mentioned earlier. It works by fetching the task state for each matching alert returned in the From the alerting plugin context, it might be possible to obtain multiple alerts states in a single request, but it would require querying the task manager index filtered by job ids obtained in the initial |
We've re-prioritized some work that - I think - will happen to work out very well for this requirement. We will be formalizing the notion of an alert "status" per issue #51099 . We'll add a new status object to the alert saved object, which means you should be able to get the status from the alertClient That gets us back down from 1+x or even 2 api calls, down to 1! (but with more data than actually required, I think) I'm going to start working on this shortly, will note the PR here once it's under way. |
@pmuellr Thanks for the heads up - it sounds very exciting! |
Doing some bookkeeping, realized I didn't post the PR with the new 'alert status' field - it's here: #75553 But also, re-reading this, and realizing the original request, that still doesn't give us instance data, just the alert data. So, that still leaves us in a It feels to me like we'll end up needing some new APIs, and I don't think we've talked about what those might look like, so here's a rough sketch:
I should note this would be to get instance data beyond just the current state of known instances (eg, it could return data about recent instances which are no longer active, like the current "get instance status" API). If we only need the current list of instances, or count of instances, it's possible we could do a query over task manager to get the current alert instance data. This also wouldn't contain any instance status data like errors. Here's what that task manager data looks like (note, it's stored as a JSON string today, so we'd need to parse it after fetching and can't search over these "fields"); this shows an alert with one active instance,
|
Another issue which depends on being able to retrieve alert instances: #85479 Let me know if everything is clear or I should add more details. |
Thx @sqren ! From #85479:
So you'll need the time, threshold, and actual value. Today, you can get the Presumably the application knows the "value that exceeded the threshold", unless it's no longer available (eg, ILM). But then the app wouldn't be able to show a pretty graph to annotate in the first place. But if we're storing the threshold value (where else would an "older" version of a threshold value, if changed over time, be available?), it makes sense to store the metric value as well, so we should add those both at the same time. In terms of "progressive enhancement" then, I'd hope we'll make those values available at some point in the future in the event log, but for today, all you'll have is the timestamp of when the alert/instance was "active". |
Sounds great! Having the timestamp will still allow us to add alert annotations to charts which is a great start. Then we can enhance this down the road with the actual values. |
Some notes from the 7.12 planning session
|
Moved from |
note: I originally opened this as issue #88908, but moving here since it's really just relevant to this overall issue It's not clear that this will be needed, but thought I'd outline generating instance data might work, when searching through multiple alerts. The thought here is that if the best we can do for now, is to generate a list of all the events for all the alerts, we'll need to have a standard way of having processing the events. For the "Alert Details" page, we generate the list of instances and data from them, via this function, which is not currently exposed as an API: kibana/x-pack/plugins/alerts/server/lib/alert_instance_summary_from_event_log.ts Lines 11 to 20 in da8abda
As we are getting more consumers of the event log coming on line, this function - or similar ones, or perhaps this one with more parameters/capabilities - could be useful if we only end up providing a way to get ALL the event log docs (eg, if we don't support a richer search mechanism). Otherwise, those consumers will be forced to implement similar logic in their own plugins. We'd need to clean this up a bit to turn it into an API, and presumably if we did this, we'd also change it to support events from multiple alerts, and not just a single alert. And presumably, it would be a function on the alertsClient. |
After discussing with @sqren yesterday, "alerts as data" is necessary for the 7.x Observability workflows, which remove the need for these APIs as a half-way measure. |
In the new Observability Overview page, we're planning to show two charts to give the user a clear picture of which alert is active at the moment.
In this chart, we want to show all active instances for all observability plugins (APM/Logs/Uptime/Metrics) grouped by type.
And in this one, we want to show some alert detail and the number of active instances next to it.
Current situation:
In the current API to get this information I have to first call
_find
to get all created alerts, then filter by Observability plugins (APM/Logs/Uptime/Metrics), and make an HTTP call for each alert to get the active instances.What a need:
An API that returns all active instances and the alert details, with the possibility to filter by consumer and alert type.
Example API:
Example response:
The text was updated successfully, but these errors were encountered: