Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] Alerts as Data Bulk Insert #93730

Closed
spong opened this issue Mar 5, 2021 · 5 comments
Closed

[RAC] Alerts as Data Bulk Insert #93730

spong opened this issue Mar 5, 2021 · 5 comments
Assignees
Labels
discuss Team:Detections and Resp Security Detection Response Team Team:Observability Team label for Observability Team (for things that are handled across all of observability) Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: rac label obsolete

Comments

@spong
Copy link
Member

spong commented Mar 5, 2021

This issue is for discussing the architecture/implementation for writing out Alerts as Data.

Relevant implementations from within detections include:

And a plethora of other utilities and helpers found in:
kibana/x-pack/plugins/security_solution/server/lib/detection_engine/signals/

The above implementations can be spread out as they cover many complex use cases, like creating alerts for aggregations, EQL sequences, alerts on alerts, and logic for preventing the creation duplicate alerts (among others :). Some patterns have started forming as we've been adding new rule types and features over the last year, but until now we've yet to have the opportunity to abstract further and clean up the control flow.

As discussed with @mikecote, @sqren and @tsg, a library implementation providing a hook for alert creation that each rule could provide their own implementation to (and call into other library utils like exceptions, deduplication, etc) would go a long way in extending the initial implementation within Detections.

Open questions

  • How will the existing paradigm of alert/alert instances port over to alerts as data? For example, in the o11y use case, a CPU Threshold Exceeded rule may write an alert, then later update that alert if it is triggered again. Will we follow this same paradigm, or are alerts immutable except for assignment/status like in the security use case? If the latter, how will what is now many alerts for one incident be displayed in the triage workflow since we don't support groupings/aggregations? Each new trigger creates a new alert with aggregate information from the predecessors and automatically closes them so there's only ever one active alert? Building block alerts for each trigger, and the shell alert is mutable?
@spong spong added discuss Team:Observability Team label for Observability Team (for things that are handled across all of observability) Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: rac label obsolete labels Mar 5, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@pmuellr
Copy link
Member

pmuellr commented Mar 5, 2021

This may not be relevant, but thought I'd mention it.

For the current event log, we currently queue up events to write, in bulk, async. However the public interface is a simple synchronous logEvent(event) style call that returns nothing. We buffer over time and space (1 sec or 100 events, whichever happens first). We also handle trying to get these written out during an "normal" Kibana shutdown (plugin shutdown is await'd by the platform (with a timeout)!) - otherwise there's a good chance we'd lose the last few records that were left in the buffer. As of 7.11 or so, you should start seeing BOTH the event log startup event AND the event log shutdown event written to the event log - we've always been writing the event log shutdown event, but didn't handle the actual "write the last buffer during shudown" till then, so before that, it was never written.

We've always treated the event log data as "not critical" - for the cases we query it in alerting (to show alert details), we handle cases where data is missing for some reason. Beyond bugs / timing issues / etc, that reason could be that the data got ILM'd away in a delete phase. I'm guessing we want to make the "alerts as data" a little more "critical" :-), but not quite sure what that means, because if we buffer the data, and then Kibana crashes with an OOM or SIGSEGV, you're going to lose that last set of buffered data.

All that code (there's not much) to deal with the event log buffering is here: https://github.com/elastic/kibana/blob/master/x-pack/plugins/event_log/server/es/cluster_client_adapter.ts#L52-L106

@spong spong mentioned this issue Mar 30, 2021
7 tasks
banderror added a commit that referenced this issue May 27, 2021
**Needed for:** rule execution log for Security #94143
**Related to:**

- alerts-as-data: #93728, #93729, #93730
- RFC for index naming #98912

## Summary

This PR adds a mechanism for writing to / reading from / bootstrapping indices for RAC project into the `rule_registry` plugin. Particularly, indices for alerts-as-data and rule execution events. This implementation is similar to existing implementations like `event_log` plugin (see #98353 (comment) for historical perspective), but we're going to converge all of them into 1 or 2 implementations. At least we should have a single one in `rule_registry` itself.

In this PR I tried to incorporate most of the feedback received in the RFC (#98912), but if you notice I missed/forgot something, please let me know in the comments.

Done in this PR:

- [x] Schema-agnostic APIs for working with Elasticsearch.
- [x] Schema-aware log definition and bootstrapping API (creating hierarchical logs).
- [x] Schema-aware write API (logging events).
- [x] Schema-aware read API (searching logs, filtering, sorting, pagination, aggregation).
- [x] Support for Kibana spaces, space-aware index bootstrapping (either at rule creation or rule execution time).

As for reviewing this PR, perhaps it might be easier to start with:

- checking description of #98912
- checking usage examples https://github.com/elastic/kibana/pull/98353/files#diff-c049ff2198cc69bd50a69e92d29e88da7e10b9a152bdaceaf3d41826e712c12b
- checking public api https://github.com/elastic/kibana/pull/98353/files#diff-8e9ef0dbcbc60b1861d492a03865b2ae76a56ec38ada61898c991d3a74bd6268

## Next steps

Next steps towards rule execution log in Security (#94143):

- define actual schema for rule execution events
- inject instance of rule execution log into Security rule executors and route handlers
- implement actual execution logging in rule executors
- update route handlers to start fetching execution events and metrics from the log instead of custom saved objects

Next steps in the context of RAC and unified implementation:

- converge this implementation with `RuleDataService` implementation
  - implement robust index bootstrapping
  - reconsider using FieldMap as a generic type parameter
  - implement validation for documents being indexed
- cover the final implementation with tests
- write comprehensive docs: update plugin README, add JSDoc comments to all public interfaces
banderror added a commit to banderror/kibana that referenced this issue May 27, 2021
**Needed for:** rule execution log for Security elastic#94143
**Related to:**

- alerts-as-data: elastic#93728, elastic#93729, elastic#93730
- RFC for index naming elastic#98912

## Summary

This PR adds a mechanism for writing to / reading from / bootstrapping indices for RAC project into the `rule_registry` plugin. Particularly, indices for alerts-as-data and rule execution events. This implementation is similar to existing implementations like `event_log` plugin (see elastic#98353 (comment) for historical perspective), but we're going to converge all of them into 1 or 2 implementations. At least we should have a single one in `rule_registry` itself.

In this PR I tried to incorporate most of the feedback received in the RFC (elastic#98912), but if you notice I missed/forgot something, please let me know in the comments.

Done in this PR:

- [x] Schema-agnostic APIs for working with Elasticsearch.
- [x] Schema-aware log definition and bootstrapping API (creating hierarchical logs).
- [x] Schema-aware write API (logging events).
- [x] Schema-aware read API (searching logs, filtering, sorting, pagination, aggregation).
- [x] Support for Kibana spaces, space-aware index bootstrapping (either at rule creation or rule execution time).

As for reviewing this PR, perhaps it might be easier to start with:

- checking description of elastic#98912
- checking usage examples https://github.com/elastic/kibana/pull/98353/files#diff-c049ff2198cc69bd50a69e92d29e88da7e10b9a152bdaceaf3d41826e712c12b
- checking public api https://github.com/elastic/kibana/pull/98353/files#diff-8e9ef0dbcbc60b1861d492a03865b2ae76a56ec38ada61898c991d3a74bd6268

## Next steps

Next steps towards rule execution log in Security (elastic#94143):

- define actual schema for rule execution events
- inject instance of rule execution log into Security rule executors and route handlers
- implement actual execution logging in rule executors
- update route handlers to start fetching execution events and metrics from the log instead of custom saved objects

Next steps in the context of RAC and unified implementation:

- converge this implementation with `RuleDataService` implementation
  - implement robust index bootstrapping
  - reconsider using FieldMap as a generic type parameter
  - implement validation for documents being indexed
- cover the final implementation with tests
- write comprehensive docs: update plugin README, add JSDoc comments to all public interfaces
banderror added a commit that referenced this issue May 27, 2021
**Needed for:** rule execution log for Security #94143
**Related to:**

- alerts-as-data: #93728, #93729, #93730
- RFC for index naming #98912

## Summary

This PR adds a mechanism for writing to / reading from / bootstrapping indices for RAC project into the `rule_registry` plugin. Particularly, indices for alerts-as-data and rule execution events. This implementation is similar to existing implementations like `event_log` plugin (see #98353 (comment) for historical perspective), but we're going to converge all of them into 1 or 2 implementations. At least we should have a single one in `rule_registry` itself.

In this PR I tried to incorporate most of the feedback received in the RFC (#98912), but if you notice I missed/forgot something, please let me know in the comments.

Done in this PR:

- [x] Schema-agnostic APIs for working with Elasticsearch.
- [x] Schema-aware log definition and bootstrapping API (creating hierarchical logs).
- [x] Schema-aware write API (logging events).
- [x] Schema-aware read API (searching logs, filtering, sorting, pagination, aggregation).
- [x] Support for Kibana spaces, space-aware index bootstrapping (either at rule creation or rule execution time).

As for reviewing this PR, perhaps it might be easier to start with:

- checking description of #98912
- checking usage examples https://github.com/elastic/kibana/pull/98353/files#diff-c049ff2198cc69bd50a69e92d29e88da7e10b9a152bdaceaf3d41826e712c12b
- checking public api https://github.com/elastic/kibana/pull/98353/files#diff-8e9ef0dbcbc60b1861d492a03865b2ae76a56ec38ada61898c991d3a74bd6268

## Next steps

Next steps towards rule execution log in Security (#94143):

- define actual schema for rule execution events
- inject instance of rule execution log into Security rule executors and route handlers
- implement actual execution logging in rule executors
- update route handlers to start fetching execution events and metrics from the log instead of custom saved objects

Next steps in the context of RAC and unified implementation:

- converge this implementation with `RuleDataService` implementation
  - implement robust index bootstrapping
  - reconsider using FieldMap as a generic type parameter
  - implement validation for documents being indexed
- cover the final implementation with tests
- write comprehensive docs: update plugin README, add JSDoc comments to all public interfaces
@peluja1012
Copy link
Contributor

Closing in favor of Alerts as Data RFC doc.

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:Detections and Resp Security Detection Response Team Team:Observability Team label for Observability Team (for things that are handled across all of observability) Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: rac label obsolete
Projects
None yet
Development

No branches or pull requests

5 participants