Expand the context available within Kibana Alerting in ES 7.7 #69611

sinnotts · 2020-06-19T08:25:35Z

Describe the feature:
When using Alerting from Kibana within Elasticsearch 7.7, it would be brilliant if it was possible to pull specific field information from an Index when an alert is triggered?

Currently the following fields are only available:

{{alertName}}
{{alertId}}
{{alertName}}
{{spaceId}}
{{tags}}
{{alertInstanceId}}
{{context.message}}
{{context.title}}
{{context.group}}
{{context.date}}
{{context.value}}

I believe this is done using mustache (https://mustache.github.io/mustache.5.html) but I can't seem to find out what context/template Kibana has available to it to populate the above.

Describe a specific use case for the feature:
I would like to setup an alert based on metricbeat data:

IF
INDEX metricbeat*
WHEN average()
OF system.filesystem.used.pct
OVER all documents
IS ABOVE 0.95
FOR THE LAST 60 seconds

Send an email of:

Alert {{alertName}}!

Server {{agent.name}} has used {{context.value}} percent of storage!

Summary:
{{system.filesystem.device_name}},
{{system.filesystem.total}}
{{system.filesystem.used.bytes}}
{{system.filesystem.available}}

{{system.filesystem.used.pct}}

Kind Regards,
Kibana

Thanks for the hard work!
S

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-06-22T11:41:06Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

chemalopezp · 2020-06-22T20:22:47Z

I have a very similar use case for this, only with logs. The alert conditions is:

WHEN more than or equals
1 log entry
WITH json.msg
IS SYMBOL_NOT_FOUND
FOR THE LAST 1 minute

I'd like to use other fields from the json object logged into the alert message.

Thank you!

elasticmachine · 2020-06-23T11:47:15Z

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

pmuellr · 2020-07-03T23:20:20Z

From issue #70174, a request to add process name to metric threshold.

chemalopezp · 2020-07-06T18:11:50Z

It is great to see some activity on this issue! :D
Just to give the team a bit more background, we are using functionbeat to stream our pino JSON logs into ElasticSearch and Kibana to report alerts. With the new Kibana features recently released we noticed we could actually a much better tuning of those and apply different actions: for example report on different Slack channels depending on the actual error code, and add into the error message much more context (which function failed and other relevant data which is streamed with the log on different json fields, including the error message).

We would like to include information from any of the log fields into the error message we send to slack.

Thank you very much!

jasonrhodes · 2020-08-03T15:44:17Z

@Zacqary can you take a look at this and write up what might be possible re: dynamic values per alert instance?

Zacqary · 2020-08-03T17:34:46Z

I'm not sure if there's a good way to do this without making changes to the Alerting plugin, but here's what I think is possible now.

The available variables for action messages, like {{alertName}} and {{context.stuff} are all pre-defined, generic values that always get generated and sent whenever an alert fires. The function in charge of executing the alert doesn't have any knowledge of which variables are actually being requested.

So following @sinnotts's example, let's say the alert parses the metric system.filesystem.used.pct and recognizes that it needs to also fetch the entire hierarchy of data from system. So instead of just fetching the system.filesystem.used.pct value, it'll query Elasticsearch for system, and evaluate the alert by looking up filesystem.used.pct within the hit for system that it receives. Assuming it fires, it can make the entire system object available like {{context.system.filesystem.available}} (We would definitely have to modify the Alerting plugin to be able to remove the context prefix).

Keep in mind we'd have to fetch the whole system object for all alerts now, because we'd have no knowledge if any of that extra data is actually going to be requested in the action message. But I don't think that's all that inefficient.

My concern here is the UX. We wouldn't be able to make {{context.system}} available in the dropdown since it'd be dynamically determined by the metric you select, and that dropdown is populated by the alerting plugin's registerType function which only runs at Kibana startup. Maybe {{context.data}} would work, but that would extend the prefix to {{context.data.system.filesystem.available}} and I kinda don't like that either.

Also, if the behavior we want is

pull specific field information from an Index when an alert is triggered

then we're not really achieving that, since the user only has access to whatever's in system. That would work fine for this example, but what if another user thinks some other kind of data in an entirely different field is relevant?

Maybe just system (or in the Logs case, json) is good enough, I just want to figure out a way to communicate to the user exactly which fields are available to them. Like if you're querying a metric a.b.c.d, and that means a.e.f.g is available to you — not really sure how to make that intuitive and discoverable.

chemalopezp · 2020-08-04T23:41:55Z

Thank you for looking into it @Zacqary

If it is of any help, it seems fine to me if not all fields are discoverable in the UI. For the logs scenario, we might want to expose different fields on the error message depending on the actual alert that has been triggered (i.e. the let's say an error log with an userId we might want to use json.userId) so it is totally understandable that these customizable fields are not really discoverable, giving the large amount of them on each index. Thanks again!

Zacqary · 2020-08-05T14:33:37Z

@mikecote I'm realizing the original post is actually using an Index Threshold alert type as an example, so I think Alerting Services might still want to be tagged on this? We can investigate getting this to work on Logs and Metrics alerts for sure, but whatever we settle on should probably propagate back to the core plugin.

mikecote · 2020-08-05T17:58:17Z

@Zacqary seems like the request for index data can apply to both index threshold and metric alerts. Since context variables are set per alert type, each would have to expose this manually. There isn't UX support yet for dynamic template variables which makes it hard to support this with the index threshold alert. From my understanding, a metric alert has similar issues because the variables change based on the metric?

chemalopezp · 2020-08-20T01:05:38Z

Thank you for looking into this @Zacqary @mikecote ! If I'm getting this correctly, it seems we have maybe 2 different parts? On one side there's the ability to use any field when an alert is triggered. On the other we have a limitation on the alarms UI, that won't make possible to show the user all the possible fields that are available to use.

Still, we can look in other parts of Kibana which fields are available and use them, so it seems to me the first part could be already completed (this would be already an amazing feature to enhance our alarms!). Thanks!

pmuellr · 2020-08-21T14:34:15Z

It's going to be difficult to extract more document info from the index threshold alert given the way it does aggregations; I think something like an essql-based alert would be great for this. The idea being that the customer ends up providing an essql query (which would include an optional query dsl filter that the essql API supports) - the results of which would be used to determine instances to schedule actions for. The instanceId would be one column; the remaining columns could then be passed as context variables, which would mean the user could set those to whatever they could get returned by sql.

pmuellr · 2020-08-21T14:42:01Z

I spent a few minutes yesterday playing with the mustache templating we use, to see if we could get it to show all the "context variables" that we make available, somehow. I think we can make a meta-variable which would list all the context variables / values, which customers could use when developing an alert, to see what data is actually available as variables, and what the values of them are. See issue #75601

That little exercise also made me realize that we can put functions in the context, which will end up being invoked when accessed from the template. Seems like a bit of a "foot gun" to me, but obviously fairly powerful. Note that these functions are invoked with no arguments, AFAIK. I'm not sure how far we could take this to make it easier for alertTypes to make context variables available without having to have the all data available up front.

jasonrhodes · 2020-09-18T18:31:14Z

@mikecote @pmuellr @Zacqary @chemalopezp as far as a way forward, I'm not sure if I have a clear understanding of what we are trying to do here and what our next steps are? I'll try to summarize my understanding so far:

Problem 1: A user is asking to be able to reference system.* values inside their alert message, and we currently don't provide that as an available context variable. We could potentially make system.* available on all alerts, as context.system.*, but what happens if someone wants host.hostname or something else outside of the system scope? Where is the line for which values we make available on context and which we don't, assuming we can't just allow a user to access anything they want arbitrarily?

Problem 2: If we do allow for some kind of dynamic loading of values in alert messages, how will a user know those values will be available to them when creating their alert message template? We won't be able to provide them as autocomplete if they're dynamic, and we also won't necessarily be able to validate that what they've typed will resolve to a real value. Do they just type what they think is a value and hope it works/test it out?

Am I understanding the state of things right now correctly? Am I missing parts? Thanks!

pmuellr · 2020-09-21T14:19:59Z

That seems like a good summary of the problem area today:

It's not clear how to expose ES data not necessarily involved in the alerting calculation, back to the actions invoked when the alert triggers. This seems like it may be especially problematic / complex for alerts that do aggregations, figuring out how to get that data back after the aggregation.
It's not clear how we would expose that to users, given our current "static" story of providing the mustache template variable values in a pop-up list, which is populated from static data provided by the alert type. I wonder if the eventual story, once we solve the first issue above, would be to "test" the alert by running it, and seeing what values it made available. Presumably we'd have to re-run this as the alert fields are being edited, since the data could presumably change as the alert is being edited.

jasonrhodes · 2020-09-21T18:07:17Z

@pmuellr Thanks. Do you know if there's an Alerting team ticket for this kind of feature, that we could link for tracking purposes, or is this more likely to be a "wontfix" because of its complexity?

@Zacqary if I understand correctly, in a very limited way we could query for all of the system.* values and provide them to the alert message interface, discoverable at context.system.*, right? Based on what you said above:

Assuming it fires, it can make the entire system object available like {{context.system.filesystem.available}}

Keep in mind we'd have to fetch the whole system object for all alerts now, because we'd have no knowledge if any of that extra data is actually going to be requested in the action message. But I don't think that's all that inefficient.

If we just incorporate this into every Metric alert, it would stop being dynamic, right? Would this still be an issue?

We wouldn't be able to make {{context.system}} available in the dropdown since it'd be dynamically determined by the metric you select, and that dropdown is populated by the alerting plugin's registerType function which only runs at Kibana startup.

Zacqary · 2020-09-21T18:20:44Z

Sorry, when I said we're fetching the whole system object for all alerts, I meant all alerts that used a system.something metric.

I do think it'd be a needless bottleneck if we fetched the root document of every possible metric, regardless of which one the alert actually selects.

jasonrhodes · 2020-09-22T01:38:15Z

@Zacqary yeah that's what I meant by this:

in a very limited way we could query for all of the system.* values and provide them to the alert message interface

it would be extremely limited to only providing the system.* values. I'm just verifying we could do that if we chose to, right?

Zacqary · 2020-09-22T16:00:08Z

Absolutely, yeah, we can query for system.* regardless of whether the alert in question is alerting on a system.* metric.

jasonrhodes · 2020-09-23T17:07:18Z

@sorantis @mukeshelastic can you weigh in on whether you think providing something static, such as "system.*" or otherwise, to ALL alert instances of our Metrics and Logs alert types would be of value to enough of our users to move forward on that?

If so, let's spin off a new ticket that outlines exactly which static sets of values we want to provide for the Metrics and Logs alert types (the values can be different for each alert type but would be static and consistent for all instances of each type).

If not, I don't think there is an action item for our group on this ticket.

Thanks!

chemalopezp · 2020-09-23T17:46:26Z

Not sure I'm following the discussion, as an user we add different fields into our logs that might be used in an alert (both in the criteria and the message). Since we manually logged those fields we don't really need a change on the "exposure" of those fields, we just need them available (i.e. added to the context of the alert).

In other words, there's some benefit of adding a few more static fields (e.g. system.*), but the at least for my use case most of the alerts are triggered by customizable events (i.e. being able to access fields of the log statement that triggered the alarm).

If it is easier to manage that way, I can split into a different ticket. Thank you for your help! :)

jasonrhodes · 2020-09-23T18:00:30Z

Since we manually logged those fields we don't really need a change on the "exposure" of those fields, we just need them available

@chemalopezp can you explain the difference between these two things? I've been using those two concepts interchangably, I think. "Exposing" fields means "making them available on the alert's context", which requires querying them at some point as part of every alert check/run.

sorantis · 2020-09-24T10:54:11Z

@sorantis @mukeshelastic can you weigh in on whether you think providing something static, such as "system.*" or otherwise, to ALL alert instances of our Metrics and Logs alert types would be of value to enough of our users to move forward on that?

@jasonrhodes The fact that these fields have to be static limits the scope of the otherwise valid use case. We've been getting feedback on how important it is to link fields in notifications when firing alerts, because it can ultimately save a lot of back and forth calls between the departments in order to figure out which host/service/system is causing trouble. Some of the fields that have been mentioned as relevant are hostname, IP address, anything org specific like labels, tags (not the alert tags).

Perhaps a starting point for adding static this could be ECS fields? Specifically for metric alerts I can see a value in adding Host, Container and Cloud fields to context. For log alerts Event and Log fields can be relevant.

Thoughts?

ManuelKugelmann · 2020-09-28T10:35:10Z

@sorantis I concur, access to the reason of the alert is crucial. If I e.g. have an alert triggered by some log messages, I would want (at least an excerpt) of these in the alert message. And maybe even a link to a search that will return the full set of messages that were the reason for the alert.
Right now I have to fallback to the clunky Watchers w/o nice UI support ...

chemalopezp · 2020-09-28T14:55:38Z

@chemalopezp can you explain the difference between these two things? I've been using those two concepts interchangably, I think. "Exposing" fields means "making them available on the alert's context", which requires querying them at some point as part of every alert check/run.

@jasonrhodes Then we are on the same page. I was under the impression that there are some concern at the UI level on how a user would be able to select these fields, which seemed a different issue.

praveenmak · 2020-11-09T22:22:06Z

I have the same issue as described in this ticket.
Another one we need is the timestamp, how can I send the timestamp in the email Alert?
"{{context.timestamp}}" does not work.

pmuellr · 2020-11-10T15:35:02Z

We have an open issue to add a timestamp as a "global" mustache variable for templates - #67389

We also have an open issue to add a "helper" for mustache variables that are "objects", to make it easier for customers to see what variables are available, in an alert message - #82044 . This should help for complex/rich objects added to the context variables, at least while a customer is developing their mustache templates used in actions.

Beyond that, I think the notion of providing variables for "related" fields in documents an alert is processing, is going to have to be alert-specific. Eg, it seems easier to me to add this kind of capability to log-based alerts, but the current shape of the index threshold alerts doesn't lend itself to doing this. Because it's just doing aggs over the indices, and so it's not clear how we'd also collect other fields as well. As mentioned in #69611 (comment), it feels like an essql-based alert might be a good fit for this, as it would presumably allow customers to select fields to be returned to the alert.

SonalJain1707 · 2022-06-17T12:53:32Z

I also have a case

I need to pull below hostname and put in connector when trying to use {{#context.hits}}{{_source}}{{/context.hits}} its returning empty . Can you please let me know how to traverse to agent and hostname and if its possible to do so.

hits" : [
{
"_index" : "metricbeat-7.12.1-000594",
"_type" : "_doc",
"_id" : "YUSMboEBINxiR4tEXwHq",
"_score" : 0.0,
"_source" : {
"@timestamp" : "2022-06-16T22:05:01.392Z",
"ecs" : {
"version" : "1.8.0"
},
"agent" : {
"version" : "7.12.1",
"hostname" : "kks-***********",

SonalJain1707 · 2022-06-22T08:37:20Z

Hi,

Also if the list is returning null value the alert is going in recovered state. Please le us know how to solve this.

elasticmachine · 2023-11-09T12:04:48Z

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

phirestalker · 2024-01-08T22:07:07Z

To me, it seems that the alert config itself could determine the "extra" information to be made available. For instance, I set an alert of system.filesystem.used.pct like the op. Inside this alert I have set a grouping on the device name, so the device name should be made available somehow. Would making available only things that were used in the query, filter, and grouping options be easier?

timroes added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Jun 22, 2020

mikecote added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services and removed Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 23, 2020

mikecote added Feature:Alerting triage_needed labels Jun 23, 2020

weltenwort added the Feature:Logs UI Logs UI feature label Jun 25, 2020

sgrodzicki assigned jasonrhodes Jun 26, 2020

pmuellr mentioned this issue Jun 30, 2020

How to pass High CPU Process Name in Kibana Alert Action #70174

Closed

jasonrhodes assigned Zacqary Aug 3, 2020

jasonrhodes unassigned jasonrhodes and Zacqary Sep 15, 2020

simianhacker added needs-refinement A reason and acceptance criteria need to be defined for this issue and removed triage_needed labels May 27, 2021

elastic deleted a comment from eric-olaya Sep 28, 2021

gbamparop added Team:obs-ux-logs Observability Logs User Experience Team and removed Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Nov 9, 2023

botelastic bot added needs-team Issues missing a team label and removed needs-team Issues missing a team label labels Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand the context available within Kibana Alerting in ES 7.7 #69611

Expand the context available within Kibana Alerting in ES 7.7 #69611

sinnotts commented Jun 19, 2020

elasticmachine commented Jun 22, 2020

chemalopezp commented Jun 22, 2020 •

edited

Loading

elasticmachine commented Jun 23, 2020

pmuellr commented Jul 3, 2020

chemalopezp commented Jul 6, 2020

jasonrhodes commented Aug 3, 2020

Zacqary commented Aug 3, 2020 •

edited

Loading

chemalopezp commented Aug 4, 2020

Zacqary commented Aug 5, 2020

mikecote commented Aug 5, 2020

chemalopezp commented Aug 20, 2020

pmuellr commented Aug 21, 2020

pmuellr commented Aug 21, 2020

jasonrhodes commented Sep 18, 2020

pmuellr commented Sep 21, 2020

jasonrhodes commented Sep 21, 2020

Zacqary commented Sep 21, 2020

jasonrhodes commented Sep 22, 2020

Zacqary commented Sep 22, 2020

jasonrhodes commented Sep 23, 2020

chemalopezp commented Sep 23, 2020

jasonrhodes commented Sep 23, 2020

sorantis commented Sep 24, 2020 •

edited

Loading

ManuelKugelmann commented Sep 28, 2020

chemalopezp commented Sep 28, 2020

praveenmak commented Nov 9, 2020

pmuellr commented Nov 10, 2020

SonalJain1707 commented Jun 17, 2022

SonalJain1707 commented Jun 22, 2022

elasticmachine commented Nov 9, 2023

phirestalker commented Jan 8, 2024

Expand the context available within Kibana Alerting in ES 7.7 #69611

Expand the context available within Kibana Alerting in ES 7.7 #69611

Comments

sinnotts commented Jun 19, 2020

elasticmachine commented Jun 22, 2020

chemalopezp commented Jun 22, 2020 • edited Loading

elasticmachine commented Jun 23, 2020

pmuellr commented Jul 3, 2020

chemalopezp commented Jul 6, 2020

jasonrhodes commented Aug 3, 2020

Zacqary commented Aug 3, 2020 • edited Loading

chemalopezp commented Aug 4, 2020

Zacqary commented Aug 5, 2020

mikecote commented Aug 5, 2020

chemalopezp commented Aug 20, 2020

pmuellr commented Aug 21, 2020

pmuellr commented Aug 21, 2020

jasonrhodes commented Sep 18, 2020

pmuellr commented Sep 21, 2020

jasonrhodes commented Sep 21, 2020

Zacqary commented Sep 21, 2020

jasonrhodes commented Sep 22, 2020

Zacqary commented Sep 22, 2020

jasonrhodes commented Sep 23, 2020

chemalopezp commented Sep 23, 2020

jasonrhodes commented Sep 23, 2020

sorantis commented Sep 24, 2020 • edited Loading

ManuelKugelmann commented Sep 28, 2020

chemalopezp commented Sep 28, 2020

praveenmak commented Nov 9, 2020

pmuellr commented Nov 10, 2020

SonalJain1707 commented Jun 17, 2022

SonalJain1707 commented Jun 22, 2022

elasticmachine commented Nov 9, 2023

phirestalker commented Jan 8, 2024

chemalopezp commented Jun 22, 2020 •

edited

Loading

Zacqary commented Aug 3, 2020 •

edited

Loading

sorantis commented Sep 24, 2020 •

edited

Loading