Upgrade to 2.7.0-dev is broken #2908

TeddyAndrieux · 2020-11-02T15:22:35Z

Component:

'salt', 'kubernetes', 'lifecycle'

What happened:

When upgrading from 2.6.x to 2.7.0-dev upgrade stop suddenly when upgrading the kubelet package because kubelet upgrade from 1.17.13 to 1.18.10 make the static pods restart (including salt-master the run the upgrade orchestrate)

# /srv/scality/metalk8s-2.7.0-dev/upgrade.sh
> Performing Pre-Upgrade checks... done [3s]
> Backing up MetalK8s configurations... done [0s]
> Backing up CAs certificates and keys... done [0s]
> Backing up etcd data... done [0s]
> Creating backup archive '/var/lib/metalk8s/backup_20201102_095035.tar.gz'... done [0s]
> Upgrading bootstrap... done [130s]
> Setting cluster version to 2.7.0-dev... done [6s]
> Launching the pre-upgrade... done [19s]
> Launching the upgrade... fail [265s]

Failure while running step 'Launching the upgrade'

Command: launch_upgrade

Output:

<< BEGIN >>
[...]
[INFO    ] Fetching file from saltenv 'metalk8s-2.7.0-dev', ** done ** 'metalk8s/orchestrate/register_etcd.sls'
time="2020-11-02T09:57:36Z" level=fatal msg="execing command in container failed: command terminated with exit code 137"
<< END >>

This script will now exit

What was expected:

Working upgrade process

Steps to reproduce

Upgrade from 2.6.x to 2.7.0

Resolution proposal (optional):

TBD

The assumption that kubelet upgrade (or just restart) will not restart static pods is maybe wrong (TBC) in that case we will need to change a bit the upgrade process :

either rewrite everything using a different approach (beacons + reactors, an operator, .... whatever)
either move some logic from salt upgrade orchestrate to upgrade script so that we can have a restart of kubelet (+ salt-master) during the upgrade process

The text was updated successfully, but these errors were encountered:

Since we use a script and we rely on salt-master running in a static pod on the bootstrap node we need to move some logic outside of the salt orchestrate to the script so that salt-master restart can be handled properly (and not brutaly interupt a salt orchestrate execution. - Etcd cluster upgrade is now part of the upgrade script - All APIServers upgrade is now part of the uppgrade script - Upgrade bootstrap engines (kubelet + containerd) locally - Then call the orchestrate to upgrade all nodes one by one Fixes: #2908

Since we use a script and we rely on salt-master running in a static pod on the bootstrap node we need to move some logic outside of the salt orchestrate to the script so that salt-master restart can be handled properly (and not brutaly interupt a salt orchestrate execution. - Etcd cluster upgrade is now part of the upgrade script - All APIServers upgrade is now part of the uppgrade script - Upgrade bootstrap engines (kubelet + containerd) locally - Then call the orchestrate to upgrade all nodes one by one This commit also add a warning in the upgrade orchestrate so that we now that this orchestrate is only a part of the upgrade process Fixes: #2908

…cript' into q/2.7

TeddyAndrieux · 2020-11-17T15:40:42Z

Fixed by #2928

TeddyAndrieux self-assigned this Nov 6, 2020

TeddyAndrieux mentioned this issue Nov 13, 2020

Move logic from orchestrate to upgrade script #2928

Merged

bert-e added a commit that referenced this issue Nov 17, 2020

Merge branch 'bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-s…

d9eb77f

…cript' into q/2.7

TeddyAndrieux closed this as completed Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to 2.7.0-dev is broken #2908

Upgrade to 2.7.0-dev is broken #2908

TeddyAndrieux commented Nov 2, 2020

TeddyAndrieux commented Nov 17, 2020

Upgrade to 2.7.0-dev is broken #2908

Upgrade to 2.7.0-dev is broken #2908

Comments

TeddyAndrieux commented Nov 2, 2020

TeddyAndrieux commented Nov 17, 2020