Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move logic from orchestrate to upgrade script #2928

Conversation

TeddyAndrieux
Copy link
Collaborator

Component:

'lifecycle'

Context:

#2908

Summary:

Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.

  • Etcd cluster upgrade is now part of the upgrade script
  • All APIServers upgrade is now part of the uppgrade script
  • Upgrade bootstrap engines (kubelet + containerd) locally
  • Then call the orchestrate to upgrade all nodes one by one

Fixes! #2908

@TeddyAndrieux TeddyAndrieux requested a review from a team November 13, 2020 14:18
@bert-e
Copy link
Contributor

bert-e commented Nov 13, 2020

Hello teddyandrieux,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Nov 13, 2020

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

salt/metalk8s/orchestrate/upgrade/init.sls Show resolved Hide resolved
"${SALT_CALL}" --local --retcode-passthrough state.sls sync_mods="all" \
metalk8s.kubernetes.kubelet.standalone saltenv="$SALTENV" \
pillar="{'metalk8s': {'endpoints': {'salt-master': $saltmaster_endpoint, \
'repositories': $repo_endpoint}}}" && sleep 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a plain sleep, look for a way to probe/sleep/repeat with an upper bound on the max time for the operation to succeed (which may well be beyond 20s), to ensure the state we'd like to be in is actually reached?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to know because we want kubelet to be upgraded and salt-master pod to be running but at the end of the state it's already the case just kubelet take a bit of time before deleting "old" static pods to create the new ones so ....
I do not see any good "probe" to do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to the static_pod_managed modifications we were discussing with @alexandre-allard-scality few weeks ago? Making sure the observed hash from K8s API (kubernetes.io/config.hash) is the same as what we can compute from the manifest...

Copy link
Collaborator Author

@TeddyAndrieux TeddyAndrieux Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we talk about a restart that does not changes the manifest file at all, it's only a kubelet + containerd upgrade so....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point. Can we look at some timestamp, check e.g. that kubelet restarted before the salt-master container?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A kubelet restart does not necessarily imply a restart of salt-master container 😃 to me it's hard to have some logic here
Maybe investigate more about what causes the restart in this upgrade and check this (likely a label or annotation changes by kubelet in containerd memory) but it means for every kubelet upgrade we need to check this ... to me it's not worth, we should keep a sleep

scripts/upgrade.sh.in Outdated Show resolved Hide resolved
scripts/upgrade.sh.in Outdated Show resolved Hide resolved
scripts/upgrade.sh.in Outdated Show resolved Hide resolved
@bert-e
Copy link
Contributor

bert-e commented Nov 13, 2020

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

The following reviewers are expecting changes from the author, or must review again:

@TeddyAndrieux TeddyAndrieux force-pushed the bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-script branch from f600133 to 45f2d0d Compare November 13, 2020 16:54
gdemonet
gdemonet previously approved these changes Nov 17, 2020
Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we want to change the sleep to a more involved check, this LGTM 👍

salt/metalk8s/orchestrate/upgrade/init.sls Outdated Show resolved Hide resolved
Instead of relying on a pillar key `orchestrate.dest_version` in
orchestrate.apiserver salt states use the `metalk8s.cluster_version` key
from the pillar (which is the version in the kube-system namespace
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
- Etcd cluster upgrade is now part of the upgrade script
- All APIServers upgrade is now part of the uppgrade script
- Upgrade bootstrap engines (kubelet + containerd) locally
- Then call the orchestrate to upgrade all nodes one by one
This commit also add a warning in the upgrade orchestrate so that we now
that this orchestrate is only a part of the upgrade process

Fixes: #2908
Do no longer provide `orchestrate.dest_version` pillar key to
`upgrade.precheck` as it do not use it, update `require_in` so that we
no longer run `Deploy Kubernetes service config objects` if one node
upgrade failed
@TeddyAndrieux TeddyAndrieux force-pushed the bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-script branch from 45f2d0d to fa5c8e1 Compare November 17, 2020 09:39
@TeddyAndrieux
Copy link
Collaborator Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Nov 17, 2020

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

The following reviewers are expecting changes from the author, or must review again:

The following options are set: approve

Copy link
Contributor

@gdemonet gdemonet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with a simple sleep for now, and I'll open a ticket for us to find a better (more robust) solution.

@TeddyAndrieux TeddyAndrieux dismissed NicolasT’s stale review November 17, 2020 14:07

All fixed or answered

@bert-e
Copy link
Contributor

bert-e commented Nov 17, 2020

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.7

The following branches will NOT be impacted:

  • development/1.0
  • development/1.1
  • development/1.2
  • development/1.3
  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Nov 17, 2020

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.7

The following branches have NOT changed:

  • development/1.0
  • development/1.1
  • development/1.2
  • development/1.3
  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6

Please check the status of the associated issue GH-2908.

Goodbye teddyandrieux.

@bert-e bert-e merged commit d9eb77f into development/2.7 Nov 17, 2020
@bert-e bert-e deleted the bugfix/GH-2908-move-logic-from-orchestrate-to-upgrade-script branch November 17, 2020 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants