Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana migrations potentially fail if there's a running ES snapshot #47808

Closed
rudolf opened this issue Oct 10, 2019 · 5 comments · Fixed by #58884
Closed

Kibana migrations potentially fail if there's a running ES snapshot #47808

rudolf opened this issue Oct 10, 2019 · 5 comments · Fixed by #58884
Assignees
Labels
Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented Oct 10, 2019

From https://github.com/elastic/cloud/issues/41742

When Kibana runs it's saved object migrations it's possible that it migrates a concrete index into an alias which performs the following steps:

  1. Reindex.kibana into .kibana_n
  2. Delete .kibana index
  3. Create .kibana alias pointing to .kibana_n

If a snapshot is currently running step (2) will throw an exception "Cannot delete indices that are being snapshotted" and cause the migration to fail requiring manual intervention.

This is more likely to happen during 7.3 -> 7.4 since we perform a migration of .kibana_task_manager into an alias and .kibana_task_manager_n

@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team Feature:Saved Objects labels Oct 10, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@spalger
Copy link
Contributor

spalger commented Oct 10, 2019

We had to implement retry logic for this in the esArchiver since snapshots are running pretty frequently on cloud... We should probably do something similar in the migrations

if (retryIfSnapshottingCount > 0 && isDeleteWhileSnapshotInProgressError(error)) {
stats.waitingForInProgressSnapshot(index);
await waitForSnapshotCompletion(client, index, log);
return await deleteIndex({
...options,
retryIfSnapshottingCount: retryIfSnapshottingCount - 1
});
}

@rudolf
Copy link
Contributor Author

rudolf commented Oct 10, 2019

Thanks, this is exactly what we need.

@tylersmalley
Copy link
Contributor

There was also retry logic added to the check for if the migrations are up-to-date:

https://github.com/elastic/kibana/blob/master/src/core/server/saved_objects/migrations/core/elastic_index.ts#L152-L201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants