Kibana migrations potentially fail if there's a running ES snapshot #47808

rudolf · 2019-10-10T10:44:27Z

From https://github.com/elastic/cloud/issues/41742

When Kibana runs it's saved object migrations it's possible that it migrates a concrete index into an alias which performs the following steps:

Reindex.kibana into .kibana_n
Delete .kibana index
Create .kibana alias pointing to .kibana_n

If a snapshot is currently running step (2) will throw an exception "Cannot delete indices that are being snapshotted" and cause the migration to fail requiring manual intervention.

This is more likely to happen during 7.3 -> 7.4 since we perform a migration of .kibana_task_manager into an alias and .kibana_task_manager_n

elasticmachine · 2019-10-10T10:44:28Z

Pinging @elastic/kibana-operations (Team:Operations)

elasticmachine · 2019-10-10T10:44:29Z

Pinging @elastic/kibana-platform (Team:Platform)

spalger · 2019-10-10T16:40:43Z

We had to implement retry logic for this in the esArchiver since snapshots are running pretty frequently on cloud... We should probably do something similar in the migrations

kibana/src/es_archiver/lib/indices/delete_index.js

Lines 49 to 56 in 3613174

    
           if (retryIfSnapshottingCount > 0 && isDeleteWhileSnapshotInProgressError(error)) { 
        
             stats.waitingForInProgressSnapshot(index); 
        
             await waitForSnapshotCompletion(client, index, log); 
        
             return await deleteIndex({ 
        
               ...options, 
        
               retryIfSnapshottingCount: retryIfSnapshottingCount - 1 
        
             }); 
        
           }

rudolf · 2019-10-10T19:27:56Z

Thanks, this is exactly what we need.

tylersmalley · 2019-10-10T19:55:10Z

There was also retry logic added to the check for if the migrations are up-to-date:

https://github.com/elastic/kibana/blob/master/src/core/server/saved_objects/migrations/core/elastic_index.ts#L152-L201

rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team Feature:Saved Objects labels Oct 10, 2019

This was referenced Dec 4, 2019

Improve Saved Object Migrations to minimize operational impact of Kibana upgrades #52202

Closed

Moving saved object management to an ES plugin #49764

Closed

tylersmalley removed the Team:Operations Team label for Operations Team label Feb 27, 2020

rudolf self-assigned this Feb 28, 2020

rudolf mentioned this issue Feb 28, 2020

Retry migration operations which fail due to snapshot in progress #58884

Merged

7 tasks

rudolf closed this as completed in #58884 Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana migrations potentially fail if there's a running ES snapshot #47808

Kibana migrations potentially fail if there's a running ES snapshot #47808

rudolf commented Oct 10, 2019 •

edited

Loading

elasticmachine commented Oct 10, 2019

elasticmachine commented Oct 10, 2019

spalger commented Oct 10, 2019

rudolf commented Oct 10, 2019

tylersmalley commented Oct 10, 2019

Kibana migrations potentially fail if there's a running ES snapshot #47808

Kibana migrations potentially fail if there's a running ES snapshot #47808

Comments

rudolf commented Oct 10, 2019 • edited Loading

elasticmachine commented Oct 10, 2019

elasticmachine commented Oct 10, 2019

spalger commented Oct 10, 2019

rudolf commented Oct 10, 2019

tylersmalley commented Oct 10, 2019

rudolf commented Oct 10, 2019 •

edited

Loading