Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to 2.7.0-dev is broken #2908

Closed
TeddyAndrieux opened this issue Nov 2, 2020 · 1 comment
Closed

Upgrade to 2.7.0-dev is broken #2908

TeddyAndrieux opened this issue Nov 2, 2020 · 1 comment
Assignees
Labels
complexity:medium Something that requires one or few days to fix kind:bug Something isn't working priority:high High priority issues, should be worked on ASAP (after urgent issues), not postponed topic:lifecycle Issues related to upgrade or downgrade of MetalK8s

Comments

@TeddyAndrieux
Copy link
Collaborator

Component:

'salt', 'kubernetes', 'lifecycle'

What happened:

When upgrading from 2.6.x to 2.7.0-dev upgrade stop suddenly when upgrading the kubelet package because kubelet upgrade from 1.17.13 to 1.18.10 make the static pods restart (including salt-master the run the upgrade orchestrate)

# /srv/scality/metalk8s-2.7.0-dev/upgrade.sh
> Performing Pre-Upgrade checks... done [3s]
> Backing up MetalK8s configurations... done [0s]
> Backing up CAs certificates and keys... done [0s]
> Backing up etcd data... done [0s]
> Creating backup archive '/var/lib/metalk8s/backup_20201102_095035.tar.gz'... done [0s]
> Upgrading bootstrap... done [130s]
> Setting cluster version to 2.7.0-dev... done [6s]
> Launching the pre-upgrade... done [19s]
> Launching the upgrade... fail [265s]

Failure while running step 'Launching the upgrade'

Command: launch_upgrade

Output:

<< BEGIN >>
[...]
[INFO    ] Fetching file from saltenv 'metalk8s-2.7.0-dev', ** done ** 'metalk8s/orchestrate/register_etcd.sls'
time="2020-11-02T09:57:36Z" level=fatal msg="execing command in container failed: command terminated with exit code 137"
<< END >>

This script will now exit

What was expected:

Working upgrade process

Steps to reproduce

Upgrade from 2.6.x to 2.7.0

Resolution proposal (optional):

TBD

The assumption that kubelet upgrade (or just restart) will not restart static pods is maybe wrong (TBC) in that case we will need to change a bit the upgrade process :

  • either rewrite everything using a different approach (beacons + reactors, an operator, .... whatever)
  • either move some logic from salt upgrade orchestrate to upgrade script so that we can have a restart of kubelet (+ salt-master) during the upgrade process
@TeddyAndrieux TeddyAndrieux added kind:bug Something isn't working topic:lifecycle Issues related to upgrade or downgrade of MetalK8s complexity:medium Something that requires one or few days to fix priority:high High priority issues, should be worked on ASAP (after urgent issues), not postponed labels Nov 2, 2020
@TeddyAndrieux TeddyAndrieux self-assigned this Nov 6, 2020
TeddyAndrieux added a commit that referenced this issue Nov 12, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one

Fixes: #2908
TeddyAndrieux added a commit that referenced this issue Nov 12, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one

Fixes: #2908
TeddyAndrieux added a commit that referenced this issue Nov 12, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one

Fixes: #2908
TeddyAndrieux added a commit that referenced this issue Nov 12, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one

Fixes: #2908
TeddyAndrieux added a commit that referenced this issue Nov 13, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one

Fixes: #2908
TeddyAndrieux added a commit that referenced this issue Nov 13, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one
This commit also add a warning in the upgrade orchestrate so that we now
that this orchestrate is only a part of the upgrade process

Fixes: #2908
TeddyAndrieux added a commit that referenced this issue Nov 17, 2020
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one
This commit also add a warning in the upgrade orchestrate so that we now
that this orchestrate is only a part of the upgrade process

Fixes: #2908
bert-e added a commit that referenced this issue Nov 17, 2020
@TeddyAndrieux
Copy link
Collaborator Author

Fixed by #2928

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:medium Something that requires one or few days to fix kind:bug Something isn't working priority:high High priority issues, should be worked on ASAP (after urgent issues), not postponed topic:lifecycle Issues related to upgrade or downgrade of MetalK8s
Projects
None yet
Development

No branches or pull requests

1 participant