Move logic from orchestrate to upgrade script #2928

TeddyAndrieux · 2020-11-13T14:18:17Z

Component:

'lifecycle'

Context:

Summary:

Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.

Etcd cluster upgrade is now part of the upgrade script
All APIServers upgrade is now part of the uppgrade script
Upgrade bootstrap engines (kubelet + containerd) locally
Then call the orchestrate to upgrade all nodes one by one

Fixes! #2908

bert-e · 2020-11-13T14:18:19Z

Hello teddyandrieux,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

bert-e · 2020-11-13T14:18:23Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

salt/metalk8s/orchestrate/upgrade/init.sls

NicolasT · 2020-11-13T14:23:02Z

scripts/upgrade.sh.in

+    "${SALT_CALL}" --local --retcode-passthrough state.sls sync_mods="all" \
+        metalk8s.kubernetes.kubelet.standalone saltenv="$SALTENV" \
+        pillar="{'metalk8s': {'endpoints': {'salt-master': $saltmaster_endpoint, \
+        'repositories': $repo_endpoint}}}" && sleep 20


Instead of a plain sleep, look for a way to probe/sleep/repeat with an upper bound on the max time for the operation to succeed (which may well be beyond 20s), to ensure the state we'd like to be in is actually reached?

It's hard to know because we want kubelet to be upgraded and salt-master pod to be running but at the end of the state it's already the case just kubelet take a bit of time before deleting "old" static pods to create the new ones so ....
I do not see any good "probe" to do this

Relates to the static_pod_managed modifications we were discussing with @alexandre-allard-scality few weeks ago? Making sure the observed hash from K8s API (kubernetes.io/config.hash) is the same as what we can compute from the manifest...

Here we talk about a restart that does not changes the manifest file at all, it's only a kubelet + containerd upgrade so....

Right, good point. Can we look at some timestamp, check e.g. that kubelet restarted before the salt-master container?

A kubelet restart does not necessarily imply a restart of salt-master container 😃 to me it's hard to have some logic here
Maybe investigate more about what causes the restart in this upgrade and check this (likely a label or annotation changes by kubelet in containerd memory) but it means for every kubelet upgrade we need to check this ... to me it's not worth, we should keep a sleep

scripts/upgrade.sh.in

bert-e · 2020-11-13T14:24:29Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

The following reviewers are expecting changes from the author, or must review again:

@NicolasT

gdemonet

Unless we want to change the sleep to a more involved check, this LGTM 👍

salt/metalk8s/orchestrate/upgrade/init.sls

Instead of relying on a pillar key `orchestrate.dest_version` in orchestrate.apiserver salt states use the `metalk8s.cluster_version` key from the pillar (which is the version in the kube-system namespace

Since we use a script and we rely on salt-master running in a static pod on the bootstrap node we need to move some logic outside of the salt orchestrate to the script so that salt-master restart can be handled properly (and not brutaly interupt a salt orchestrate execution. - Etcd cluster upgrade is now part of the upgrade script - All APIServers upgrade is now part of the uppgrade script - Upgrade bootstrap engines (kubelet + containerd) locally - Then call the orchestrate to upgrade all nodes one by one This commit also add a warning in the upgrade orchestrate so that we now that this orchestrate is only a part of the upgrade process Fixes: #2908

Do no longer provide `orchestrate.dest_version` pillar key to `upgrade.precheck` as it do not use it, update `require_in` so that we no longer run `Deploy Kubernetes service config objects` if one node upgrade failed

TeddyAndrieux · 2020-11-17T13:42:31Z

/approve

bert-e · 2020-11-17T13:42:35Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

The following reviewers are expecting changes from the author, or must review again:

@NicolasT

The following options are set: approve

gdemonet

Let's go with a simple sleep for now, and I'll open a ticket for us to find a better (more robust) solution.

All fixed or answered

bert-e · 2020-11-17T14:07:45Z

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

✔️ development/2.7

The following branches will NOT be impacted:

development/1.0
development/1.1
development/1.2
development/1.3
development/2.0
development/2.1
development/2.2
development/2.3
development/2.4
development/2.5
development/2.6

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

Any commit you add on the source branch will trigger a new cycle after the
current queue is merged.
Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

bert-e · 2020-11-17T15:22:05Z

I have successfully merged the changeset of this pull request
into targetted development branches:

✔️ development/2.7

The following branches have NOT changed:

development/1.0
development/1.1
development/1.2
development/1.3
development/2.0
development/2.1
development/2.2
development/2.3
development/2.4
development/2.5
development/2.6

Please check the status of the associated issue GH-2908.

Goodbye teddyandrieux.

TeddyAndrieux requested a review from a team November 13, 2020 14:18

NicolasT previously requested changes Nov 13, 2020

View reviewed changes

TeddyAndrieux force-pushed the bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-script branch from f600133 to 45f2d0d Compare November 13, 2020 16:54

gdemonet previously approved these changes Nov 17, 2020

View reviewed changes

salt/metalk8s/orchestrate/upgrade/init.sls Outdated Show resolved Hide resolved

TeddyAndrieux added 3 commits November 17, 2020 10:39

salt: Use cluster_version in upgrade apiserver orchestrate

3c8bab9

Instead of relying on a pillar key `orchestrate.dest_version` in orchestrate.apiserver salt states use the `metalk8s.cluster_version` key from the pillar (which is the version in the kube-system namespace

salt: Upgrade orchestrate small improvement

fa5c8e1

Do no longer provide `orchestrate.dest_version` pillar key to `upgrade.precheck` as it do not use it, update `require_in` so that we no longer run `Deploy Kubernetes service config objects` if one node upgrade failed

TeddyAndrieux dismissed gdemonet’s stale review via fa5c8e1 November 17, 2020 09:39

TeddyAndrieux force-pushed the bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-script branch from 45f2d0d to fa5c8e1 Compare November 17, 2020 09:39

gdemonet approved these changes Nov 17, 2020

View reviewed changes

gdemonet mentioned this pull request Nov 17, 2020

Improve waiting behaviour in case of potential container restarts #2933

Open

bert-e merged commit d9eb77f into development/2.7 Nov 17, 2020

bert-e deleted the bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-script branch November 17, 2020 15:22

TeddyAndrieux mentioned this pull request Nov 17, 2020

Upgrade to 2.7.0-dev is broken #2908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move logic from orchestrate to upgrade script #2928

Move logic from orchestrate to upgrade script #2928

TeddyAndrieux commented Nov 13, 2020

bert-e commented Nov 13, 2020

bert-e commented Nov 13, 2020

NicolasT Nov 13, 2020

TeddyAndrieux Nov 13, 2020

gdemonet Nov 17, 2020

TeddyAndrieux Nov 17, 2020 •

edited

Loading

gdemonet Nov 17, 2020

TeddyAndrieux Nov 17, 2020

bert-e commented Nov 13, 2020

gdemonet left a comment

TeddyAndrieux commented Nov 17, 2020

bert-e commented Nov 17, 2020

gdemonet left a comment

bert-e commented Nov 17, 2020

bert-e commented Nov 17, 2020

Move logic from orchestrate to upgrade script #2928

Move logic from orchestrate to upgrade script #2928

Conversation

TeddyAndrieux commented Nov 13, 2020

bert-e commented Nov 13, 2020

Hello teddyandrieux,

bert-e commented Nov 13, 2020

Waiting for approval

NicolasT Nov 13, 2020

Choose a reason for hiding this comment

TeddyAndrieux Nov 13, 2020

Choose a reason for hiding this comment

gdemonet Nov 17, 2020

Choose a reason for hiding this comment

TeddyAndrieux Nov 17, 2020 • edited Loading

Choose a reason for hiding this comment

gdemonet Nov 17, 2020

Choose a reason for hiding this comment

TeddyAndrieux Nov 17, 2020

Choose a reason for hiding this comment

bert-e commented Nov 13, 2020

Waiting for approval

gdemonet left a comment

Choose a reason for hiding this comment

TeddyAndrieux commented Nov 17, 2020

bert-e commented Nov 17, 2020

Waiting for approval

gdemonet left a comment

Choose a reason for hiding this comment

bert-e commented Nov 17, 2020

In the queue

bert-e commented Nov 17, 2020

TeddyAndrieux Nov 17, 2020 •

edited

Loading