-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RAC] Change index bootstrapping strategy #113389
[RAC] Change index bootstrapping strategy #113389
Conversation
Absolutley brilliant overview, @Kerry350 — thank you! |
x-pack/plugins/rule_registry/server/rule_data_client/rule_data_client.ts
Show resolved
Hide resolved
x-pack/plugins/rule_registry/server/rule_data_client/rule_data_client.ts
Show resolved
Hide resolved
x-pack/plugins/rule_registry/server/rule_data_client/rule_data_client.ts
Show resolved
Hide resolved
@Kerry350 Thanks a lot for the great summary! I did some testing locally and here are my findings so far:
No new alerts appeared in the Alerts table as expected, but the Inventory Rule appeared as
I will restart my ES and do more testing. I will start from a clean state, I will create some new rules before messing around with component templates to verify that I get alerts and then I will start messing around with making more changes. |
@Kerry350 Is there an easy way I can bring some apm data to my local cluster? I would like to create some apm rules and then try to change the apm component template https://github.com/elastic/kibana/blob/master/x-pack/plugins/apm/server/plugin.ts#L121. |
@mgiota Thank you for the testing
Hmm, it's interesting this would happen with no changes.
It should stay active, this is the result we want given these "catastrophic" scenarios. We don't want RAC / alerts as data etc to interfere with the overall alerting framework. Rule execution should carry on as normal, and actions and so on should still be dispatched. The active state on the Rules and Connectors page is determined by the alerting framework side of things. For us to reach this stage the assumption is (or at least we assume) that some sort of bad mapping has made it past both PR review and QA and into the release. It should, hopefully, never happen, but if it does we definitely don't want RAC to stop peoples potentially critical alerts.
Ah, okay, then that might have resulted in the above state. Although not if you only touched APM resources 🤔 Are you 100% sure you didn't touch the
It shouldn't. I'll try fiddling with the Index Management page (I hadn't changed things there).
This is a good question. As I say I hadn't considered people coming to that page and changing the component templates etc 😬 @jasonrhodes Would be interested in your thoughts on this bit? |
Are you using the cluster scraping script from Chris? If so you can add it to your config. I have the following:
|
I tested again and here are my findings. In a nutshell I identified that root of the problem was that Scenario 1
Scenario 2
The only difference between 2 scenarios is that in the first scenario But what if a non-additive change happens through Index Management UI? @jasonrhodes I would like to hear your thoughts on this. |
@mgiota Ah, okay, yes. I should have paid more attention to Change in question: 820b1ca We can ignore this as an issue, it'll be fixed by having new archive data 👍 Moving forward we will also think about ways to have solid functional tests for these additive only mappings changes. |
(The good news is it caught the problem which is what we want) |
@Kerry350 Exactly! That's what I was thinking right now. I will proceed with a bit more testing (do a few additive changes and verify everything works as expected). Changes through |
Yep! I will try what you suggested! |
Update: We've just discussed the Index Management UI query in a meeting. The conclusion is that this is a far reaching problem that affects many different things, and there isn't really anything we can do about it. It isn't ideal, but we at least know it's a vector for problems now. |
@Kerry350 I did a few additive changes to inventory and apm rule types and I verify that everything works as expected. Good job :-) |
I have one question regarding Index Management UI and hidden indices. The |
Thanks for the thorough testing, much appreciated 🙏
That's a good question. My understanding is "no there isn't", but I'll look into that as I'm not 100%. |
@elasticmachine merge upstream |
@weltenwort Thank you for reviewing 🙏
This should be specific to a namespace now, I am still verifying this. Edit: Looks good 👍
This is changed now.
I am unsure about this one. And this is what's currently broken. I changed this to perform a:
(same as the mappings) But this will result in the error:
The error makes sense, the technical component template provides the setting Therefore, do we want to just limit settings updates to the dynamic settings? If so, is there some easy and reliable way to separate what is static and dynamic? Last question is with regards to metadata, how will we update this for the backing indices? I understand updating this within the actual index template, but not for existing indices. We have the |
Good point! I only see two options right now:
Updates of static index settings need to be treated like non-additive mapping updates, which is forbidden until we have a properly specified update process. So option 2 doesn't sound too bad right now. @jasonrhodes what do you think? Can we defer updating the settings and make it part of the requirements of the update process that supports rollovers/conflicting mapping updates?
You're right, the |
The index |
Seems ok to me, so long as we can figure out how to clearly communicate this to anyone else working on these registry-connected templates. The only other solution I could think of is catching the error and inspecting it, and removing any settings updates that failed due to being static, but that sounds like a vector for a lot of confusing scenarios, potentially. Go with whichever you both think makes sense here but please think about how we can document this kind of thing for future engineers who work on rules? |
This reverts commit 847fbdc.
…pings-and-templates-on-change
Outside of this PR we (Kerry / Jason / Felix ) decided to go with option 2 for now. And this will be communicated via a code comment for now, as there isn't anywhere else more obvious to mention it. We may consider an "allow list" down the road, but this should be decided on almost in the same way as a schema. |
This should be good to go now, the three core issues that were brought up have been fixed (bar the settings issue, which is deferred). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comment below about deprecating the unsafe.indexUpgrade.enabled
setting, I couldn't find a way to break it (within the known limitations). 👏 Nicely done!
indexUpgrade: schema.object({ | ||
enabled: schema.boolean({ defaultValue: false }), | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, maybe we should leave this in the schema and only flag it as "unused" as to not break Kibana startup?
deprecations: ({ deprecate, unused }) => [
deprecate('enabled', '8.0.0'),
unused('unsafe.indexUpgrade.enabled'),
],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weltenwort Thanks for the reviews 🙏 Will merge once we're greeeeen. |
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
History
To update your PR or re-run it, just comment with: cc @Kerry350 |
* Change index bootstrapping to cater for non-additive changes only
💚 Backport successful
This backport PR will be merged automatically after passing CI. |
* Change index bootstrapping to cater for non-additive changes only Co-authored-by: Kerry Gallagher <[email protected]>
Summary
This PR implements #108941. These changes aim to support additive mappings changes only.
Note:
The changes here do not currently reflect these two ACs:
_meta
objectAs I don't think we need them yet, and I also want to speak to Felix about them.
High level overview
(This may be helpful for those unfamiliar with the inner workings of this code)
installIndexLevelResources()
and is invoked wheninitializeIndex()
is called via theRuleDataService
. Solutions, for example, will do this on pluginsetup
. You can think of these as the "building blocks" that will create the namespace level resources.default
, so there is always a namespace). If there is no writer cached for that namespace,initializeWriter()
will attempt toinstallAndUpdateNamespaceLevelResources()
.installAndUpdateNamespaceLevelResources()
will take our "building blocks" from before and install / update the namespaced index template, create a concrete write index (if needed), and attempt to update the mappings of any current concrete write indices (via first simulating the index template viasimulateIndexTemplate()
, and then attempting toput
those mappings).installAndUpdateNamespaceLevelResources()
cannot be completed then writing is completely disabled.Testing notes
Putting aside the more obvious concerns (e.g. templates are installed correctly), the most important thing is that a developer accidentally performing a non-additive change does not break the overall alerting framework (as in, executor runs aren't broken).
Dev tools is likely to be your best friend here. Below are some useful snippets that you might want to check whilst testing (change
logs
to any other registration context):GET _cat/indices/.alerts-observability*
GET /_index_template/.alerts-observability.logs.alerts-default*
GET /_cat/aliases
POST .alerts-observability.logs.alerts-default/_rollover
(to check the ILM policy works as expected)GET .alerts-observability*/_mapping
GET /_component_template/.alerts-observability.logs.alerts-mappings
(or any value fromcomposed_of
when checking the index template)Things to check:
Are the resources installed correctly with a blank slate? Does writing alerts documents work?
If you make an additive change, for example to a component template, do things work as expected? (The index template should update, as should the mappings for any existing concrete write indices). You can introduce these types of additive changes here as an example: https://github.com/elastic/kibana/blob/master/x-pack/plugins/infra/server/services/rules/rule_data_client.ts#L31 (feel free to try other registration contexts / solutions)
If you make a non-additive change, for example changing the
type
of a field, do things work as expected? (There should be a very clearly logged error, writing should be disabled, and executor runs should be unaffected). Log output should look roughly like this in that scenario: