Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] turn off observability alerts as data writing in a more granular way #119602

Merged
merged 9 commits into from
Dec 1, 2021

Conversation

mgiota
Copy link
Contributor

@mgiota mgiota commented Nov 24, 2021

Fixes #119217

xpack.ruleRegistry.write.disabledRegistrationContexts flag was introduced to disable writing to observability alerts-as-data indices in a more granular way.

The registration contexts we use are:

  • observability.logs
  • observability.metrics
  • observability.apm
  • observability.uptime

How to test

  • In kibana.dev.yml create a new config xpack.ruleRegistry.write.disabledRegistrationContexts : ['observability.logs'] (you could try with other values from the above list)
  • Delete alerts indices by restarting ES and kibana in a local setup. If you use a CCS setup make sure you stop Kibana and delete alerts indices, index templates and component templates. Here's an example of how you could reset your cluster
http DELETE 'https://YOUR_ENDPOINT/_index_template/.alerts*' && 
http DELETE 'https://YOUR_ENDPOINT/_component_template/.alerts*' && 
http DELETE 'https://YOUR_ENDPOINT/.kibana*,.internal*,.tasks*'
  • Restart kibana
  • Create a new rule with sensitive thresholds for log threshold (if you specified another registration context in kibana.dev.yml create a rule of that type accordingly)
  • Verify that writing to the specified registration context(s) is disabled. You could verify this under Kibana > Stack Management > Index Management.
    • Under Indices tab make sure to enable the Include hidden indices toggle, search for .internal.alerts and verify that nothing appears on the list.
    • Under Index templates tab, enable View System templates, search for alerts and verify that nothing appears on the list.
    • Verify that the specified disabled registration contexts don't appear under Component templates
  • You could also verify that disabling writing to specified registration contexts works, by creating rules for the specified registration contexts and making sure that no alerts appear on the Alerts table

@mgiota mgiota force-pushed the 119217_turn_off_alerts_granular branch from 16fe39b to 8df538d Compare November 24, 2021 13:08
@mgiota mgiota force-pushed the 119217_turn_off_alerts_granular branch from 8df538d to e0f87f8 Compare November 24, 2021 13:12
@mgiota mgiota marked this pull request as ready for review November 25, 2021 07:32
@mgiota mgiota self-assigned this Nov 25, 2021
@mgiota mgiota added Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Theme: rac label obsolete v8.0.0 v8.1.0 auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes labels Nov 25, 2021
@mgiota mgiota marked this pull request as draft November 25, 2021 08:21
@mgiota mgiota force-pushed the 119217_turn_off_alerts_granular branch from 5a4db77 to 9c3344e Compare November 29, 2021 23:37
…a plugin service and not in the resourceInstaller
@mgiota mgiota force-pushed the 119217_turn_off_alerts_granular branch from f04b4b9 to 5aa1d17 Compare November 30, 2021 07:10
@mgiota mgiota marked this pull request as ready for review November 30, 2021 07:12
@mgiota
Copy link
Contributor Author

mgiota commented Nov 30, 2021

@elasticmachine merge upstream

@@ -195,7 +200,8 @@ export class RuleDataService implements IRuleDataService {
return new RuleDataClient({
indexInfo,
resourceInstaller: this.resourceInstaller,
isWriteEnabled: this.isWriteEnabled(),
isWriteEnabled:
this.isWriteEnabled() && !this.isRegistrationContextDisabled(registrationContext),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't confusing to combine isWriteEnabled and disabledRegistrationContexts with the same key in RuleDataClient? Because it would be hard to determine why the value is true/false.

Besides, from a convention perspective, it seems that the relation here is 1:1 in the RuleDataClient

Copy link
Contributor Author

@mgiota mgiota Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkanout Great question, yep you are right. In the beginning I thought to add a separate key, but that means I would need to change more places in the code (places where isWriteEnabled is called, I should check for disabledEegistrationContexts as well). The way I have it now is more centralized. I do the checks in one place.

@weltenwort is there a benefit of combining isWriteEnabled and disabledRegistrationContexts into one key vs splitting it into separate keys?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can try to answer the opposite question: Why would the RuleDataClient need to know about the reason for being disabled? In the spirit of keeping the coupling as loose as possible it would make sense not to introduce the knowledge about a per-registration-context disablement feature out of the RuleDataClient if not necessary.

So what would be the benefit of passing them separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltenwort Ok if we want to keep the coupling as loose as possible, it makes sense to keep the knowledge about per-registration-context disabling out of the RuleDataClient. @fkanout do you have any objections?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the knowledge that RuleDataClient already has via isWriteEnabled is similar to isRegistrationContextDisabled , the difference is only the granularity of that info. (everything/selected things). Unless if RuleDataClient shouldn't know about isWriteEnabled in the first place - That, I don't know.

My understanding is once Alert-as-Data is adopted across our products, the general xpack.rule_registry.write.enabled will be deprecated. While the xpack.ruleRegistry.write.disabledRegistrationContexts could stay for long. From that point, I would say they should be separated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkanout What if we add even more granularity and disable writing per rule type id? Would we want to pass one more key for example disabledRuleTypeIds? Does the RuleDataClient need to know this level of detail? I don't know what is the correct answer.

For now I would keep it as it is. We could discuss it further if you want and come up with the most appropriate solution.

Copy link
Contributor

@ersin-erdal ersin-erdal Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the only thing that RuleDataClient should know is if isWriteEnabled true or false. All the complex logics in it, makes it hard to read/understand.

I would implement all the logic in this.isWriteEnabled() method at line 118.
Otherwise, what is the reason to create such a method? this.options.isWriteEnabled could be used in everywhere.

So, it would be:

public isWriteEnabled(): boolean {
    return this.options.isWriteEnabled && 
    !this.options.disabledRegistrationContexts.includes(registrationContext);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ersin-erdal Sounds good to me. Let me refactor and push the changes.

Copy link
Contributor

@fkanout fkanout Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would implement all the logic in this.isWriteEnabled() method at line 118.

It could be more readable, but it is still the same solution. We have a consensus about it, so let's merge it like that 👍🏻

Copy link
Contributor

@fkanout fkanout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
ruleRegistry 141 145 +4
Unknown metric groups

API count

id before after diff
ruleRegistry 167 171 +4

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @mgiota

@mgiota mgiota merged commit 5deb23b into elastic:main Dec 1, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Dec 1, 2021
…r way (elastic#119602)

* [RAC] turn off writing to disabled alerts indices

* fix error

* fix errors

* do not install component templates for disabled registration contexts

* add resource installer unit tests

* refactoring: disable installing index level resources in the rule data plugin service and not in the resourceInstaller

* refactor based on review comments

* update comment for isWriteEnabled method

Co-authored-by: Kibana Machine <[email protected]>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
8.0

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Dec 1, 2021
…r way (#119602) (#120126)

* [RAC] turn off writing to disabled alerts indices

* fix error

* fix errors

* do not install component templates for disabled registration contexts

* add resource installer unit tests

* refactoring: disable installing index level resources in the rule data plugin service and not in the resourceInstaller

* refactor based on review comments

* update comment for isWriteEnabled method

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: mgiota <[email protected]>
@fkanout
Copy link
Contributor

fkanout commented Dec 1, 2021

I approved the PR, as it fulfills the ACs. However, I shared a couple of questions/scenarios with @mgiota. These could be edge cases or false positives. However, I will follow up here to share them and to get everyone's feedback/ thoughts.

Scenario:

  1. The flag xpack.ruleRegistry.write.disabledRegistrationContexts is OFF (all contexts are allowed)
  2. Kibana starts and indices initiated with Alert-as-Data.
  3. Create a rule. e.g. Logs thresholds
  4. Alerts are ingested
  5. Shutdown Kibana
  6. Turn the flag xpack.ruleRegistry.write.disabledRegistrationContexts ON with observability.logs
  7. Rerun Kibana.
  8. Try to create a rule for Logs thresholds

Questions ⁉️:
A. The Alert-as-Data Log indices are still there, and we can create a rule based on their field. Is that Ok?
B. What going to happen if we carry on and create a rule? Will we have alerts?
C. Will the rule registry still update the alert status, as the check against the flag is done in the initiation phase?
D. What is the behavior of the life-cycle-executor after restarting Kibana? Does it have the latest status of a rule and its alerts?

I tried to make it as plain as I could and be careful with the terminologies. However, please feel free to ask if something is not clear.

@jasonrhodes
Copy link
Member

Thanks @fkanout -- from my perspective, in your scenario, I would expect no writes at all to happen for any rule with "observability.logs" as its registration context after step 7.

Note: In all cases with this new flag it should work exactly the same as xpack.ruleRegistry.write.enabled: false works. If we determine that we need to change how the disabledRegistrationContexts flag works, we should be sure to change write.enabled to work the same way when all contexts are off, too.

For your questions:

A. The Alert-as-Data Log indices are still there, and we can create a rule based on their field. Is that Ok?

We shouldn't remove existing indices, so that makes sense. And rule creation should absolutely continue -- this scenario is probably most likely to occur for someone who is using the alerting framework to run rules against their data but doesn't want the alert documents to be written/updated for some reason.

B. What going to happen if we carry on and create a rule? Will we have alerts?

The rule should execute as normal, and schedule actions as normal. It should not create any new alert documents or update any previously created alert documents.

C. Will the rule registry still update the alert status, as the check against the flag is done in the initiation phase?

Good question. My understanding is that this flag is checked on every write, but I may be mistaken. We should confirm, because I don't think we should continue to update alerts if this flag is off. However, as I mentioned above, if this is already not the case for the write.enabled: false scenario, we'd have to explore changing that as well and what impact that might have.

D. What is the behavior of the life-cycle-executor after restarting Kibana? Does it have the latest status of a rule and its alerts?

Not sure I follow this one. The task manager will start up again, find all rules, and start running them again. It will handle passing in the "previous run" state just as it usually does on subsequent executions, as if the restart didn't happen. We do a lookup of the existing alerts to update during the executor phase, so I think we would find any previous alerts and continue to update them, but I'm not sure if we treat the restart as a "resolved" state for any alerts that were active when the system was restarted. I imagine we don't, I'm not sure what we expect here.

@fkanout
Copy link
Contributor

fkanout commented Dec 2, 2021

Thank you, @jasonrhodes!
For C, confirmed, the flag checked in every write. i.e., If the flag is ON (disabled context), then no alerts will be written.

TinLe pushed a commit to TinLe/kibana that referenced this pull request Dec 22, 2021
…r way (elastic#119602)

* [RAC] turn off writing to disabled alerts indices

* fix error

* fix errors

* do not install component templates for disabled registration contexts

* add resource installer unit tests

* refactoring: disable installing index level resources in the rule data plugin service and not in the resourceInstaller

* refactor based on review comments

* update comment for isWriteEnabled method

Co-authored-by: Kibana Machine <[email protected]>
@mgiota mgiota deleted the 119217_turn_off_alerts_granular branch January 4, 2022 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Theme: rac label obsolete v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RAC] New config flag needed to turn off alerts as data writing in a more granular way
7 participants