-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move logic from orchestrate to upgrade script #2928
Move logic from orchestrate to upgrade script #2928
Conversation
Hello teddyandrieux,My role is to assist you with the merge of this Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
"${SALT_CALL}" --local --retcode-passthrough state.sls sync_mods="all" \ | ||
metalk8s.kubernetes.kubelet.standalone saltenv="$SALTENV" \ | ||
pillar="{'metalk8s': {'endpoints': {'salt-master': $saltmaster_endpoint, \ | ||
'repositories': $repo_endpoint}}}" && sleep 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a plain sleep
, look for a way to probe/sleep/repeat with an upper bound on the max time for the operation to succeed (which may well be beyond 20s), to ensure the state we'd like to be in is actually reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to know because we want kubelet to be upgraded and salt-master pod to be running but at the end of the state it's already the case just kubelet take a bit of time before deleting "old" static pods to create the new ones so ....
I do not see any good "probe" to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relates to the static_pod_managed
modifications we were discussing with @alexandre-allard-scality few weeks ago? Making sure the observed hash from K8s API (kubernetes.io/config.hash
) is the same as what we can compute from the manifest...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we talk about a restart that does not changes the manifest file at all, it's only a kubelet + containerd upgrade so....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point. Can we look at some timestamp, check e.g. that kubelet restarted before the salt-master container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A kubelet restart does not necessarily imply a restart of salt-master container 😃 to me it's hard to have some logic here
Maybe investigate more about what causes the restart in this upgrade and check this (likely a label or annotation changes by kubelet in containerd memory) but it means for every kubelet upgrade we need to check this ... to me it's not worth, we should keep a sleep
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
The following reviewers are expecting changes from the author, or must review again: |
f600133
to
45f2d0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we want to change the sleep
to a more involved check, this LGTM 👍
Instead of relying on a pillar key `orchestrate.dest_version` in orchestrate.apiserver salt states use the `metalk8s.cluster_version` key from the pillar (which is the version in the kube-system namespace
Since we use a script and we rely on salt-master running in a static pod on the bootstrap node we need to move some logic outside of the salt orchestrate to the script so that salt-master restart can be handled properly (and not brutaly interupt a salt orchestrate execution. - Etcd cluster upgrade is now part of the upgrade script - All APIServers upgrade is now part of the uppgrade script - Upgrade bootstrap engines (kubelet + containerd) locally - Then call the orchestrate to upgrade all nodes one by one This commit also add a warning in the upgrade orchestrate so that we now that this orchestrate is only a part of the upgrade process Fixes: #2908
Do no longer provide `orchestrate.dest_version` pillar key to `upgrade.precheck` as it do not use it, update `require_in` so that we no longer run `Deploy Kubernetes service config objects` if one node upgrade failed
45f2d0d
to
fa5c8e1
Compare
/approve |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
The following reviewers are expecting changes from the author, or must review again: The following options are set: approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with a simple sleep
for now, and I'll open a ticket for us to find a better (more robust) solution.
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue GH-2908. Goodbye teddyandrieux. |
Component:
'lifecycle'
Context:
#2908
Summary:
Since we use a script and we rely on salt-master running in a static pod
on the bootstrap node we need to move some logic outside of the salt
orchestrate to the script so that salt-master restart can be handled
properly (and not brutaly interupt a salt orchestrate execution.
Fixes! #2908