Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC][Security Solution][Detections] Rule Execution Log - technical implementation #101013

Closed
11 tasks done
banderror opened this issue May 31, 2021 · 4 comments
Closed
11 tasks done
Assignees
Labels
epic Feature:Detection Rules Security Solution rules and Detection Engine Feature:EventLog Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: rac label obsolete

Comments

@banderror
Copy link
Contributor

banderror commented May 31, 2021

Related to: #91265

Summary

Build Security Rule Execution Log on top of the event_log plugin. Implement logging and querying the log under the hood of the app.

Important!

DOs:

  • Use this proposal as a foundation: [Security Solution][Detections] Proposal: building Rule Execution Log on top of Event Log and ECS #94143.
  • Define a schema for rule execution events and metrics.
  • Implement a rule execution log abstraction that would provide a simple api for writing to the log and executing queries, hiding non-important details from the rest of security_solution.
  • Inject an instance of rule execution log into executor functions of Security rule types and into Security route handlers.
  • Implement actual execution logging in rule executors (writing events and metrics to the log) and route handlers.
  • Update route handlers to start fetching execution events and metrics from the log instead of custom saved objects.
  • Hide the new execution log implementation under its own feature switch. If the switch is disabled, we use the old custom saved objects. If it's enabled, we use the new execution log.
  • Finally, remove the old implementation and custom SO definition from the code.

DONTs:

  • No changes in the UI is expected.
  • The plan was to implement this new log within the new incarnation of Detection Engine built on top of rule_registry, so no need to change the current implementation of Detection Engine.
  • Backwards compatibility with our custom Saved Objects (of type siem-detection-engine-rule-status) is not required. On the Security side, when we release the new Rule Execution Log, it’d be acceptable to drop the historical data stored in custom SOs (1 current status + 5 failures per rule). This would likely allow us to not invest time into developing migrations and fallbacks for custom SOs, especially considering breaking changes in SO ids in the upcoming 8.0 ([Meta] Breaking saved object type conversions in 8.0 #100489).

Sub-tasks

Further clarification of requirements:

  • Figure out requirements for RBAC for Rule Execution Log from the product standpoint. Mind not only current requirements in Security, but also future requirements in Observability/Stack/other solutions in the context of Unified Rule Monitoring.
  • Related: communicate to the Observability team that probably requirements for RBAC for "evaluations" in Observability should be figured out as well.

Implementation:

Handling breaking changes in Saved Object ids by 7.16 FF (#105819):

@banderror banderror added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels May 31, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror
Copy link
Contributor Author

As discussed within the team, backwards compatibility with our custom Saved Objects (of type siem-detection-engine-rule-status) is not required:

Q: On the Security side, when we release the new Rule Execution Log, can we drop support for the existing execution data stored as custom Saved Objects? Or migration / fallback will be necessary?
A: We think it’d be acceptable to drop the historical data stored in custom SOs (1 current status + 5 failures per rule). This would likely allow us to not invest time into developing migrations and fallbacks for custom SOs, especially considering breaking changes in SO ids in the upcoming 8.0 (#100489).

@banderror banderror changed the title [Security Solution][Detections] Rule Execution Log - technical implementation [RAC][Security Solution][Detections] Rule Execution Log - technical implementation Jul 21, 2021
@banderror banderror added the Feature:Detection Rules Security Solution rules and Detection Engine label Jul 21, 2021
banderror added a commit that referenced this issue Nov 1, 2021
…g v1 - raw implementation (#115574)

**Ticket:** #106469, #101013

## Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects.

- [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- [x] Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- [x] New methods should return data in the legacy status SO format.
- [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- [x] Use the new endpoint on the Rule Details page.

## Near-term plan for technical implementation of the Rule Execution Log (#101013)

**Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

- Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- New methods should return data in the legacy status SO format.
- Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- Use the new endpoint on the Rule Details page.

**Stage 2: Reading last 5 failures from Event Log v2 - clean implementation**

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

- Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
- Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
- Think over changes in `IRuleExecutionLogClient` to support the new data model.
- Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
- Update `IRuleExecutionLogClient` to return data in the new format. 
- Update all the endpoints (including the raw new one) to return data in the new format.
- Update Rule Details page to consume data in the new format.
- Update Rule Management page to consume data in the new format.

**Stage 3: Reading last 5 failures from Event Log v3 - new SO**

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

- Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
- Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`.

**Stage 4: Cleanup and misc**

- Revisit the problem of deterministic ordering ([comment](#115574 (comment)))
- Remove rule execution log's glue code: adapters, feature switch.
- Remove the legacy rule status SO.
- Mark the legacy rule status SO as deleted in Kibana Core.
- Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
- Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
- Add test coverage.

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 1, 2021
…g v1 - raw implementation (elastic#115574)

**Ticket:** elastic#106469, elastic#101013

## Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects.

- [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- [x] Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- [x] New methods should return data in the legacy status SO format.
- [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- [x] Use the new endpoint on the Rule Details page.

## Near-term plan for technical implementation of the Rule Execution Log (elastic#101013)

**Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

- Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- New methods should return data in the legacy status SO format.
- Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- Use the new endpoint on the Rule Details page.

**Stage 2: Reading last 5 failures from Event Log v2 - clean implementation**

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

- Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
- Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
- Think over changes in `IRuleExecutionLogClient` to support the new data model.
- Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
- Update `IRuleExecutionLogClient` to return data in the new format. 
- Update all the endpoints (including the raw new one) to return data in the new format.
- Update Rule Details page to consume data in the new format.
- Update Rule Management page to consume data in the new format.

**Stage 3: Reading last 5 failures from Event Log v3 - new SO**

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

- Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
- Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`.

**Stage 4: Cleanup and misc**

- Revisit the problem of deterministic ordering ([comment](elastic#115574 (comment)))
- Remove rule execution log's glue code: adapters, feature switch.
- Remove the legacy rule status SO.
- Mark the legacy rule status SO as deleted in Kibana Core.
- Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
- Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
- Add test coverage.

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine added a commit that referenced this issue Nov 1, 2021
…g v1 - raw implementation (#115574) (#116947)

**Ticket:** #106469, #101013

## Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects.

- [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- [x] Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- [x] New methods should return data in the legacy status SO format.
- [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- [x] Use the new endpoint on the Rule Details page.

## Near-term plan for technical implementation of the Rule Execution Log (#101013)

**Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

- Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- New methods should return data in the legacy status SO format.
- Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- Use the new endpoint on the Rule Details page.

**Stage 2: Reading last 5 failures from Event Log v2 - clean implementation**

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

- Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
- Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
- Think over changes in `IRuleExecutionLogClient` to support the new data model.
- Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
- Update `IRuleExecutionLogClient` to return data in the new format. 
- Update all the endpoints (including the raw new one) to return data in the new format.
- Update Rule Details page to consume data in the new format.
- Update Rule Management page to consume data in the new format.

**Stage 3: Reading last 5 failures from Event Log v3 - new SO**

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

- Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
- Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`.

**Stage 4: Cleanup and misc**

- Revisit the problem of deterministic ordering ([comment](#115574 (comment)))
- Remove rule execution log's glue code: adapters, feature switch.
- Remove the legacy rule status SO.
- Mark the legacy rule status SO as deleted in Kibana Core.
- Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
- Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
- Add test coverage.

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Georgii Gorbachev <[email protected]>
@banderror
Copy link
Contributor Author

We've partially implemented rule execution logging to Event Log (v7.16), and started to read last 5 failures from it on the Rule Details page (v8.0). We will close this epic and

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Feature:Detection Rules Security Solution rules and Detection Engine Feature:EventLog Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: rac label obsolete
Projects
None yet
Development

No branches or pull requests

6 participants