-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating resources on a stack upgrade should be easier #103841
Comments
Pinging @elastic/kibana-core (Team:Core) |
FWIW, this will always be the case. Even if we were to expose a specific API to register upgrade hooks or functions, these functions would have to be implemented in a way that takes into account that multiples Kibana nodes can be performing the operation concurrently. There is currently no synchronization between Kibana instances, and no real way to acquire a 'lock' from ES. The SO migration algorithm has the same problematic. |
@pgayvallet why will this always be the case? We don't support rolling upgrades, no? So all instances go down, first one to upgrade takes care of the upgrade process, and only when successfully completed the other instances come back up (without executing the upgrade process). What am I missing here? |
This assumption is wrong unfortunately (would be way too easy). All Kibana instances are allowed to boot at the same time during a migration (this is a supported scenario), and we don't have any synchronization mechanism between instances, so each instance do have to take into consideration that other instances can be performing an upgrade at the same time. I tried to find the document where the whole 'idempotent versus lock' approach discussion occurred a while ago for SO Migv2 to add more context of all the challenges of introducing a lock mechanism, but I couldn't find it. @joshdover @kobelb maybe you have a better memory than I do? |
Besides the necessity to introduce a consensus protocol, there is a problem of blocking Kibana start - the very reason we deprecated async lifecycles. A plugin-specific async operation shouldn't block or prevent (in case of an exception) other Kibana plugins to start.
@kobelb Can it benefit from the solution you are designing for the automatic upgrade of the Fleet packages? |
I think this is most complete write-up we have: https://github.com/elastic/kibana/blob/master/rfcs/text/0013_saved_object_migrations.md#52-single-node-migrations-coordinated-through-a-leaselock Essentially, it's impossible to build a bullet-proof lease/lock on top of Elasticsearch as it is. So in order to use a lock, we'd need to either add Kibana node clustering & master election or work with the Elasticsearch team to provide a first-class lock mechanism.
Given the above, I'm curious which of these operations would be problematic to build in an idempotent way that could be run on all Kibana nodes during
Also I'm not sure what's in these specific indices. Is this append-only immutable data, or are these stateful mutable documents? If it's the former, reindexing like this should be pretty safe and straightforward, otherwise some more thought will need to be put into reindexing this data. |
Building on what @joshdover articulated, ideally, we'd be able to run these migrations scripts "exactly once". However, this is a hard problem to solve when working with a distributed system. In this situation, we have a distributed system because Kibana controls the API calls that need to be made against Elasticsearch. One of the common tricks to getting "exactly once" semantics is to couple "at least once" with idempotent operations. This is conceptually what @joshdover is recommending above. In this situation, we want Kibana to perform the migrations "at least once" but we need idempotent operations in Elasticsearch to ensure that even though these API calls might be made multiple times, they cause Elasticsearch to be in the same state as if they were only made once. Kibana can lazily achieve "at least once" by executing the code on literally every start-up operation, and this is possible right now. However, we can consider adding some optimizations to Kibana to make this more efficient and once we have a successful completion, no longer execute this code. This is really just a performance optimization though, as we'll need to anticipate multiple Kibana instances running the migration code in parallel and multiple times consecutively. |
Even though the problem is still present, the constraints around its resolutions drastically changed (we do need to support rolling upgrades with serverless now), and AFAIK such needs must be addressed on a case-by-case basis (and we do have a few issues opens for specific needs). I'll go ahead and close this, feel free to reopen with the updated requierements if necessary. |
As part of the RAC project, we are installing various component and index templates, and creating indices/aliases that use these templates. When we roll out a new version of the stack, some of these templates might have changed, and we need to update the mappings of write indices, and rollover/migrate data when needed.
Currently, our only option is to use the setup or start lifecycle. However, these are executed on every Kibana instance, so any upgrade strategy needs to take into account that several Kibana instances might want to upgrade assets at the same time. We can use a task, but we also need to know when an asset upgrade has been finished, as we need to block write operations until the upgrade has been completed (this might be possible with a task, not sure).
I'd like us to investigate whether we can make this easier, by e.g. providing a hook that is guaranteed to get executed on one Kibana instance, and an
afterUpgrade
hook that is called on each Kibana instance, or something that allows us to hook into the upgrade process that happens before Kibana starts, in the same vein as the SO migration process.The text was updated successfully, but these errors were encountered: