-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack Monitoring + Kibana Alerting #45571
Comments
Pinging @elastic/stack-monitoring |
I'm going to loop in a few people on the Kibana Alerting team who will probably be interested in following this issue: |
Had a chat with @pmuellr to discuss some options. We will discuss these with @bmcconaghy and @kobelb next. Some notes:
|
After some discussion I think there are 2 main issues:
I think the first point (tying the actions and alerts to a user) implies some UI setup step. Maybe it is clicking an ‘turn on alerting’ button (or maybe attaching it to the existing ’turn on monitoring" button). That would actually address both points:
This does imply breaking changes (config and a move away from space-agnostic), but I think through 7.x we’d need an opt-in strategy for Kibana alerting in Stack Monitoring anyway. @chrisronline @cachedout would be curious to hear your thoughts on having a UI initiated setup step. With that assumption I think we can address the current list of blockers. Perhaps there are non-UI options, but I think they'd still have to involve a user account and spaces. |
Hi @peterschretlen. Thanks for the response to this. I agree with the way you've categorized the issues here. I'll begin with a discussion of Spaces. The transition to Alerting in my mind, has always been the right point for Stack Monitoring to transition to being Spaces-aware. I'm not sure how much we want to get into that in this discussion, but for an initial round of changes, I could see something like the following: A role that has the 'Read' privilege:
A role that has the 'All' privilege, obviously would not have the above restrictions. For the CRUD of the alerts themselves, I'm thinking that we'd want to mirror what I understand the alerting privilege model to be -- which is to tie the ability to modify a given alert to the privileges granted by the role/space the current user is in. (Please correct me if this understanding is incorrect!) Regarding the initial setup step, if I understand your proposal correctly, we'd end up with a release where existing watcher-based alerts were not "migrated" to the new Alerting platform although, a setup process could be initiated which could recreate them in the new Alerting platform. In truth, the more I think about this initial "alerting setup" step, the more I like it. Not only does it provide a clean way to recreate old watcher-based alerts, it also provides a means for us to give the user an experience to potentially select existing metrics for alerts as well. I think I can start to see the shape of how that would look but I'm really interested in what @chrisronline thinks here. Regarding the point about these initial alerts essentially being tied to the user who initiates the action, I think this is fine, so long as we allow for the possibility that the "setup step" could be re-initiated at any point so that if they didn't intend this behavior, they could switch to the account they intended for long-term alert management. Alternatively, perhaps the Kibana alert management application could allow a super-user to migrate alert ownership between users? I don't know if that's been discussed at all but could provide another means for a user to address this concern if it came up. |
Good point, makes sense there should be a way to reset the alerts/re-run the setup. Regarding ownership and transfer, we do have |
I want to clarify something:
In my original post, this is actually not accurate. For cluster alerts to work, the user needs to configure an outgoing SMTP server through |
I think we need to talk about a few things here:
Net new user experience, where they have no cluster alertsthink the overall experience will be close. The main difference in the set up process will be the need to hit an API endpoint/click a UI button that will create the cluster alerts, using a given user's credentials and provided space id. I think an API endpoint is important here so users can automate this setup (just like they can currently with cluster alerts) - it's important to note that an API endpoint should be available to support the creation of the email action (and appropriate SMTP configuration) which the cluster alerts will use. Existing user experience, where they are currently using cluster alertsWe need to ensure that we can properly disable existing cluster alerts before starting to run new Kibana cluster alerts. This might be tricky, as we don't have the necessary permission set for the Alerting management within spacesFirst off, I don't think Stack Monitoring is a good fit for spaces. AFAIK, spaces exist as a way to segment data to various users to ensure users only see the data that matters to them (for example, they can only see the list of dashboards that affects their part of the organization instead of needing to filter/search for the ones they want in a giant list of dashboards). We don't really have anything like that currently in Stack Monitoring - I think the assumption that a single type of user is the only user accessing Stack Monitoring is a safe bet. Could we imagine some scenarios where some users might want to only see certain monitoring data? Sure, but I don't think those are requests we hear from users (please correct me here if there is supporting data). I don't know why we need to fit a square into a circle, which it feels like we are doing here. Secondly, let's say we do integrate into spaces. It's a strange experience for users to create cluster alerts in one space, and not have the ability to see that in another space. This could easily create duplication of alerts (which @peterschretlen mentioned earlier) which feels like it will confusing to users. Let's jump forward a bit in our roadmap and imagine we have customizable alerts in monitoring for various metrics we collect. For example, we have a CPU threshold alert on all nodes in our ES cluster at 90%. If this is configured in space A and a user has access to space A and space B, it seems likely they could go to space B (for maybe another purpose), go to Stack Monitoring, and not see that alert. Maybe they'd wonder if it was deleted and try and recreate it? That feels like a confusing experience. Thoughts on this @cachedout? |
My thinking here has been that we try to do this by using the blacklist setting:
To test this, I used this query: Then to remove a few cluster alerts, I used this:
I then re-ran WDYT @chrisronline ? |
I can see this need in the future. There are a few things that lead me to believe it's desirable to use Spaces for allow users to segment monitoring data by space.
https://github.com/elastic/enhancements/issues/6894 To your point, though, this isn't a feature that's requested a lot but we do get requests for it it here and there. I suspect, too, that some use cases are going to emerge for this, over time, especially as we start to add additional services like Site Search into Stack Monitoring.
I agree that this is a potential problem. I would also like to understand what happens if a space is removed. Do all the alerts get removed along with it? TBH, it feels like what's required here is to not have an alert tied to one and only one space, but to be able to select which spaces an alert appears in, and perhaps to have this be all spaces by default. That said, I don't know enough about how @peterschretlen and @mikecote are thinking about this model and I would be quite interested to hear their thoughts. |
Interesting. I haven't used that before. I can play with it and verify, but that might work. I wonder how this will play out when internal collection is disabled. Hopefully we can be fully migrated to Kibana alerting before that and it's not a concern, but something to consider. |
Yes, deleting a space will delete all the objects in that space, including the alerts.
My understanding is that before spaces, people would create multiple Kibana instances so they could get their own view of ES data. So the question I ask myself is: if someone wanted to setup multiple Kibana instances pointing at the same stack monitoring indices, what would they be separating? Probably cluster access in the multi-cluster scenario, but I think alerts would be another. SREs want early warnings about a long garbage collection or spike in high indexing rate, but logging users might only understand/care about is if the cluster is red/yellow/green and does it affect their ability to use Discover. So I lean towards alerts being isolated to a space. I don't imagine there are a lot of use cases for multi-space Stack Monitoring, but I do think Stack Monitoring fits the spaces model well. The more apps we have that are space-aware, the more useful spaces become. That said we are heading the direction of moving/sharing between spaces, with saved object enhancements like copy to space and considering sharing between spaces. |
@cachedout I've been playing with your idea and I really like it, but I'm not sure the best way to handle it. I've been researching and playing with two different ways:
The first one is nice because it doesn't involve any changes to Elasticsearch, but if a user creates another exporter after enabling Kibana alerting, new cluster alerts are created. We could add a check in Kibana to always ensure all exporters have the same blacklist. The second one is nice because we don't have to do anything in Kibana, but it does mean that users won't be able to get clusters alerts working ever again. I don't necessarily think we will need to, but it feels possible that something may not work correctly in the first release of Kibana cluster alerts and users might need a fallback. We could get around this by adding configs though. Thoughts? |
To add some more thoughts to the above ^^ I think we have to go with 2. another downside of option 1 is that this method only works with the cluster connected to Kibana - we are unable to update cluster settings for other monitored clusters. |
While this is a downside, I'm not sure it's a reason to discard this option entirely. If we just document (perhaps as a pop-up when clicking the button to enable Kibana alerting?) that blacklists need to be modified for other non-connected clusters? The reason that I don't think this is too much of an issue is that the number of current alerts is relatively small and their appearance for most users is exceedingly rare. Even if they don't follow this step and the worst-case is that the watch continues to exist, it's an easy (and hopefully well-documented) fix for them. Thoughts? |
Yes, this is an option, but it feels like we should try to avoid that if we can (and I think we can in this situation). I think we should always favor the path that involves the least amount of work for the user.
Perhaps, I don't know the data on this honestly. I'm not sure how many folks have more than one production cluster, but it does mean that folks with more than one will need to potentially perform two separate actions: one button click in Kibana, and the other being a manual curl to the other cluster(s) to update the cluster settings. On top of all of this, I think it gets more complicated when we think about a slow rollout of our migration - It probably makes sense (see discussion here) to release these migrations incrementally as they are ready so we can learn and fix issues along the way. Assuming we want this approach, it complicates the docs a bit where they won't be blacklisting all cluster alerts, but just the few that we have Kibana alerts for. Option 2 fits in nicely here, as we can simply update the explicit blacklist in each new version we add more Kibana alerts - the user will not have to worry about anything (theoretically at least). Do you have any issues with option 2? |
The case you make here is really good. I mentioned over in the discussion on rollout strategies that I think we should gradually introduce the migrated alerts but not enable them until they're all ready. Given that, I think option 2 where we'd just blacklist the existing alerts en masse makes sense. |
This issue is to serve as a list of issues/questions/comments/blockers for the Stack Monitoring team as they investigate migrating existing watches to Kibana alerting, with the goal of feature and functional parity.
Blockers
We need a default email action to exist so our default alerts can send an email to a configured (throughSee this commentkibana.yml
) email without needing to do any SMTP setup.The text was updated successfully, but these errors were encountered: