Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud context in kibana alerting #101018

Closed
Kushmaro opened this issue May 31, 2021 · 11 comments
Closed

Cloud context in kibana alerting #101018

Kushmaro opened this issue May 31, 2021 · 11 comments
Labels
enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting NeededFor:Cloud Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@Kushmaro
Copy link

Kushmaro commented May 31, 2021

When kibana is in cloud, it would be highly beneficial to have some cloud context available for the alert actions (like the ones available for the alert itself)

Using "production deployment" (for deployment being monitored)
Using "monitoring deployment" (for deployment doing the monitoring)

Most notably:

  • Production Cloud Deployment ID
  • Production Cloud Deployment name (already available as "context.clusterName")
  • Production Cloud Region + Provider
  • Production Organization/Account Id
  • Cloud UI URL (can vary in ESS/ESSP/ECE)

Additional nice-to-haves:

  • Full cloud deployment aliased URL

image

@Kushmaro Kushmaro added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label May 31, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@mikecote
Copy link
Contributor

mikecote commented May 31, 2021

Relates to #67660 for providing the cloud plugin such capability.

@gmmorris
Copy link
Contributor

gmmorris commented Jun 1, 2021

This should be available in any cloud Kibana, as it's in the kibana.yml, but we don't have any "cloud specific" variables at the moment, so we need to verify this works correctly.

Also worth noting - this will not work if a customer has a mixed environment (cloud + on-prem), which even though we discourage, isn't officially prohibited. In such a situation actions would result in different output depending on the Kibana running them.

image (4)

@gmmorris
Copy link
Contributor

gmmorris commented Jun 1, 2021

Relates to #67660 for providing the cloud plugin such capability.

Are these related or actually identical? 🤔

@mikecote
Copy link
Contributor

mikecote commented Jun 1, 2021

In theory, #67660 would be for the alerting team, and this issue would be for the cloud team if we decide to go such an approach.

@gmmorris gmmorris added Feature:Alerting Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework labels Jul 2, 2021
@gmmorris gmmorris added the loe:needs-research This issue requires some research before it can be worked on or estimated label Jul 15, 2021
@gmmorris gmmorris added enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues labels Aug 13, 2021
@Kushmaro
Copy link
Author

Kushmaro commented Aug 30, 2021

There's an important distinction to be made on the "deployment Id" and "region" field -
If a customer is using a central monitoring deployment, the context for the alert should be for the deployment firing the alert (and not for the monitoring deployment doing the monitoring itself), that's why I'm not sure the "config_settings" hold here, as they are for the "monitoring deployment" but not for the "production deployment" being monitored.

just wanted to explicitly call that out @gmmorris
I've edited the issue to better reflect this.

@gmmorris
Copy link
Contributor

gmmorris commented Sep 1, 2021

Thanks @Kushmaro , I don't know much about how these things are configured.... would this be specific to the monitoring deployment or the rule?
Could a monitoring deployment monitor multiple other deployments? Or would all the rules in a single monitoring deployment all monitor the same deployment?
If it's the latter, I assume we're still talking about something that's passed in via kibana.yml?

@Kushmaro
Copy link
Author

Kushmaro commented Sep 1, 2021

Could a monitoring deployment monitor multiple other deployments?

ding ding ding :) yep, that's exactly how it works.
So just like context.clusterName passes the name of the cluster being monitored (called "Production cluster" in stack monitoring terms) the deployment ID's showing in the alerts, should be those of the deployment being monitored (sent over somehow to the monitoring cluster)

Hope this makes sense, if not, I'd be happy to jump on a zoom.
/cc @ravikesarwani who can also help drive this forward.

@ravikesarwani
Copy link
Contributor

We recommend users to send their observability (logs and metrics) data to a separate monitoring cluster. In “production environments” most users will consolidate many production clusters observability data to a single monitoring cluster. Stack monitoring recommends and supports this scenario.

Having said that there are times when users still enable self monitoring (because of cost, simplicity or other reasons) wherein production and monitoring cluster is the same.

I think it would be ideal to have the cloud context available for both (production and monitoring) deployments so users can add in the Action messages or be able to use it in other ways (reporting etc). The reason I say we need both is because depending upon the type of alert the first inclination for the users (or internal teams) maybe to verify the monitoring data of the related entities before jumping on the actual production cluster/cloud console to start making any changes.

@gmmorris
Copy link
Contributor

gmmorris commented Sep 2, 2021

Could a monitoring deployment monitor multiple other deployments?

ding ding ding :) yep, that's exactly how it works.
So just like context.clusterName passes the name of the cluster being monitored (called "Production cluster" in stack monitoring terms) the deployment ID's showing in the alerts, should be those of the deployment being monitored (sent over somehow to the monitoring cluster)

I'm sure this is something missing in my knowledge, but I don't understand how the Alerting framework could possibly do this automatically, given the framework isn't querying anything.

I suggest @arisonl, @ravikesarwani and @Kushmaro have a chat and figure out how these pieces are meant to fit together from a product perspective, because I get the feeling there's a piece missing in the middle between the framework part and the Cloud part of this ER. 🤔

@gmmorris gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@mikecote
Copy link
Contributor

mikecote commented Sep 1, 2022

I'm going to close this issue. We think it will be best to start adding these variables in the rule type (as context variables) first. Rule types will be aware of the "production deployment" as the framework isn't aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting NeededFor:Cloud Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

6 participants