Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

siem-detection-engine-rule-status migration could cause failed upgrade #116423

Closed
rudolf opened this issue Oct 27, 2021 · 8 comments · Fixed by #117962
Closed

siem-detection-engine-rule-status migration could cause failed upgrade #116423

rudolf opened this issue Oct 27, 2021 · 8 comments · Fixed by #117962
Assignees
Labels
bug Fixes for quality problems that affect the customer experience impact:critical This issue should be addressed immediately due to a critical level of impact on the product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.16.0 v8.0.0 v8.1.0

Comments

@rudolf
Copy link
Contributor

rudolf commented Oct 27, 2021

If alertId is not a string we don't remove the alertId

https://github.com/elastic/kibana/blob/7.16/x-pack/plugins/security_solution/server/lib/detection_engine/rules/legacy_rule_status/legacy_migrations.ts#L55

However, since there is no mapping for alertId this could theoretically cause the migration to fail with mapping set to strict, dynamic introduction of [alertId] within [siem-detection-engine-rule-status] is not allowed
Related #114585

A more defensive approach would be to always remove alertId:

index 72ab4a2237b..e4df3487202 100644
--- a/x-pack/plugins/security_solution/server/lib/detection_engine/rules/legacy_rule_status/legacy_migrations.ts
+++ b/x-pack/plugins/security_solution/server/lib/detection_engine/rules/legacy_rule_status/legacy_migrations.ts
@@ -52,7 +52,11 @@ export const legacyMigrateRuleAlertIdSOReferences = (
 
   // early return if alertId is not a string as expected
   if (!isString(alertId)) {
-    return { ...doc, references: existingReferences };
+    return {
+      ...doc,
+      attributes: otherAttributes,
+      references: existingReferences,
+    };
   }
 
   const alertReferences = legacyMigrateAlertId({

(Sorry for missing this during our code review of the PR 😅 )

@rudolf rudolf added bug Fixes for quality problems that affect the customer experience v8.0.0 Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.16.0 labels Oct 27, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@rudolf
Copy link
Contributor Author

rudolf commented Oct 27, 2021

Note: we also have a test harness for testing migrations https://docs.elastic.dev/kibana-dev-docs/tutorials/testing-plugins#integration-testing

It's very similar to your existing e2e test except that it uses the import API instead of esArchiver, might not add much value to convert the existing tests but just an FYI

@MadameSheema MadameSheema added Team:Detection Rule Management Security Detection Rule Management Team impact:critical This issue should be addressed immediately due to a critical level of impact on the product. labels Oct 27, 2021
@banderror
Copy link
Contributor

@rudolf, that's a great catch! Thank you so much 🙏

However, since there is no mapping for alertId this could theoretically cause the migration to fail with mapping set to strict, dynamic introduction of [alertId] within [siem-detection-engine-rule-status] is not allowed

Just curious - why theoretically? 🙂

test harness for testing migrations https://docs.elastic.dev/kibana-dev-docs/tutorials/testing-plugins#integration-testing

I didn't know about it! Super helpful, thank you. Will take a closer look.
Is it possible to catch these kind of bugs in this type of integration tests? What are the limitations of the test harness and what it does under the hood?

@rudolf
Copy link
Contributor Author

rudolf commented Oct 27, 2021

Just curious - why theoretically? 🙂

This could only happen if there was a document with an alertId that wasn't a String but also not null so it would have to be a number or something like ["myvalue"]. This might only ever happen if a user meddles with their data in which case it's not so unusual if it causes a migration to fail.

@rudolf
Copy link
Contributor Author

rudolf commented Oct 28, 2021

Is it possible to catch these kind of bugs in this type of integration tests? What are the limitations of the test harness and what it does under the hood?

Yeah the test harness should catch this specific bug.

The harness basically just spins up Elasticsearch and Kibana like FTR would. You then supply outdated saved objects which it will import into Kibana/Elasticsearch using the Saved Objects Import API. When saved objects are imported we apply migrations to them so during this step we will exercise the migration logic. Because the migrated objects are written to Elasticsearch, if there's a field left after the migration for which there's no mapping it will cause this bug to surface.

After importing, the test harness will use the Saved Objects export API to export the objects so that you test can assert that they have the right shape after the migration.

One limitation is that your saved object type should be importable/exportable but I believe after 8.0 most types should already be importable/exportable. Another thing to be aware of is import/export hooks will be applied so if a type has these hooks it could change the data in a way that's not directly related to the saved object migration.

@rudolf
Copy link
Contributor Author

rudolf commented Nov 3, 2021

I had a look at #112869 which is very similar, but in this case the ruleAlertId field was never removed from the mappings so it couldn't cause a failed migration. However, if, in the future we even think "hey this field is unused, let's remove it from the mappings" we could introduce an upgrade failure.

I see comments that this saved object type is deprecated so maybe it's highly unlikely that we ever make any changes until we remove the whole thing but I just want to flag it.

@spong
Copy link
Member

spong commented Nov 3, 2021

I also didn't know this test harness existed, so thanks for sharing @rudolf! 🙂

One limitation is that your saved object type should be importable/exportable but I believe after 8.0 most types should already be importable/exportable.

That said, this SO isn't exposed in the SOM UI, so we won't be able to leverage it here. We're planning to remove this SO entirely in early 8.x out in favor of leveraging the rule execution log, but good to know either way 👍.

I had a look at #112869 which is very similar, but in this case the ruleAlertId field was never removed from the mappings so it couldn't cause a failed migration. However, if, in the future we even think "hey this field is unused, let's remove it from the mappings" we could introduce an upgrade failure.

Yeah I spoke to @FrankHassanabad about this during implementation and he mentioned he ended up keeping it around for certain compatibility reasons iirc, but noted that there shouldn't be any reason it couldn't be removed. There were less usages of alertId within the scope of the siem-detection-engine-rule-status SO, so I ended up removing it in effort to prevent folks from using it at all instead of accessing via references. Either way, thanks for catching this corner case here, really appreciate it! 🙂

@banderror banderror added Team:Detections and Resp Security Detection Response Team v8.1.0 labels Nov 8, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@spong spong assigned spong and unassigned banderror Nov 8, 2021
spong added a commit that referenced this issue Nov 9, 2021
…Id is not a string (#117962)

## Summary

Resolves #116423, and adds an e2e test catching this behavior as we can't test via the migration test harness since the `siem-detection-engine-rule-status` SO isn't exposed within the SO Manager UI.

Also adds note with regards to changes necessary once core issue #115153 is resolved. See https://github.com/elastic/kibana/pull/114585/files#r729620927. Note: existing `find_statuses`/`find_rules` integration tests will fail once fixed, so no additional tests necessary. 



### Checklist

Delete any items that are not applicable to this PR.

- [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 9, 2021
…Id is not a string (elastic#117962)

## Summary

Resolves elastic#116423, and adds an e2e test catching this behavior as we can't test via the migration test harness since the `siem-detection-engine-rule-status` SO isn't exposed within the SO Manager UI.

Also adds note with regards to changes necessary once core issue elastic#115153 is resolved. See https://github.com/elastic/kibana/pull/114585/files#r729620927. Note: existing `find_statuses`/`find_rules` integration tests will fail once fixed, so no additional tests necessary. 



### Checklist

Delete any items that are not applicable to this PR.

- [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 9, 2021
…Id is not a string (elastic#117962)

## Summary

Resolves elastic#116423, and adds an e2e test catching this behavior as we can't test via the migration test harness since the `siem-detection-engine-rule-status` SO isn't exposed within the SO Manager UI.

Also adds note with regards to changes necessary once core issue elastic#115153 is resolved. See https://github.com/elastic/kibana/pull/114585/files#r729620927. Note: existing `find_statuses`/`find_rules` integration tests will fail once fixed, so no additional tests necessary. 



### Checklist

Delete any items that are not applicable to this PR.

- [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine added a commit that referenced this issue Nov 9, 2021
…Id is not a string (#117962) (#118038)

## Summary

Resolves #116423, and adds an e2e test catching this behavior as we can't test via the migration test harness since the `siem-detection-engine-rule-status` SO isn't exposed within the SO Manager UI.

Also adds note with regards to changes necessary once core issue #115153 is resolved. See https://github.com/elastic/kibana/pull/114585/files#r729620927. Note: existing `find_statuses`/`find_rules` integration tests will fail once fixed, so no additional tests necessary. 



### Checklist

Delete any items that are not applicable to this PR.

- [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Garrett Spong <[email protected]>
kibanamachine added a commit that referenced this issue Nov 9, 2021
…n alertId is not a string (#117962) (#118040)

* [SecuritySolution][Detections] Fixes rule status migration when alertId is not a string (#117962)

## Summary

Resolves #116423, and adds an e2e test catching this behavior as we can't test via the migration test harness since the `siem-detection-engine-rule-status` SO isn't exposed within the SO Manager UI.

Also adds note with regards to changes necessary once core issue #115153 is resolved. See https://github.com/elastic/kibana/pull/114585/files#r729620927. Note: existing `find_statuses`/`find_rules` integration tests will fail once fixed, so no additional tests necessary. 



### Checklist

Delete any items that are not applicable to this PR.

- [X] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

* Fixes typecheck in test

Co-authored-by: Garrett Spong <[email protected]>
Co-authored-by: Garrett Spong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:critical This issue should be addressed immediately due to a critical level of impact on the product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.16.0 v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants