You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I most recently upgraded an EKS cluster in #2085, and will use notes from there and adjust them as I go and iterate on them once again.
# For reference, this is the steps I took when upgrading carbonplan from k8s 1.19 to k8s
# 1.24, Jan 24th 2023.
#
# 1. Updated the version field in this config from 1.19 to 1.20
#
# - It is not allowed to upgrade the control plane more than one minor at the time
#
# 2. Upgraded the control plane (takes ~10 minutes)
#
# - I ran into permission errors, so I visited the AWS cloud console to
# create an access key for my user and set it up temporary environment
# variables.
#
# export AWS_ACCESS_KEY_ID="..."
# export AWS_SECRET_ACCESS_KEY="..."
#
# eksctl upgrade cluster --config-file eksctl-cluster-config.yaml --approve
#
# 3. Deleted all non-core nodegroups
#
# - I had to add a --drain=false flag due to an error likely related to a
# very old EKS cluster.
#
# - I used --include="nb-*,dask-*" because I saw that the core node pool
# was named "core-a", and the other nodes started with "nb-" or "dask-".
#
# eksctl delete nodegroup --config-file=eksctl-cluster-config.yaml --include "nb-*,dask-*" --approve --drain=false
#
# 4. Updated the version field in this config from 1.20 to 1.22
#
# - It is allowed to have a nodegroup +-2 minors away from the control plan version
#
# 5. Created a new core nodepool (core-b)
#
# - I ran into "Unauthorized" errors and resolved them by first using the
# deployer to acquire credentials to modify a ConfigMap named "aws-auth"
# in the k8s namespace kube-system.
#
# deployer use-cluster-credentials carbonplan
#
# kubectl edit cm -n kube-system aws-auth
#
# eksctl create nodegroup --config-file=eksctl-cluster-config.yaml --include "core-b" --install-nvidia-plugin=false
#
# 6. Deleted the old core nodepool (core-a)
#
# - I first updated the eksctl config file to include a "core-a" entry,
# because I didn't really add a "core-b" previously, I just renamed the
# "core-a" to "core-b".
#
# eksctl delete nodegroup --config-file=eksctl-cluster-config.yaml --include "core-a" --approve
#
# 7. Upgraded add-ons (takes ~3*5s)
#
# eksctl utils update-kube-proxy --cluster=carbonplanhub --approve
# eksctl utils update-aws-node --cluster=carbonplanhub --approve
# kubectl patch daemonset -n kube-system aws-node --patch='{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"aws-node"}],"containers":[{"name":"aws-node","securityContext":{"allowPrivilegeEscalation":null,"runAsNonRoot":null}}]}}}}'
# eksctl utils update-coredns --cluster=carbonplanhub --approve
#
# - I diagnosed two separate errors following this:
#
# kubectl get pod -n kube-system
# kubectl describe pod -n kube-system aws-node-7rcsw
#
# Warning Failed 9s (x7 over 69s) kubelet Error: container has runAsNonRoot and image will run as root
#
# - the aws-node daemonset's pods failed to start because of a too
# restrictive container securityContext not running as root.
#
# aws-node issue: https://github.com/weaveworks/eksctl/issues/6048.
#
# Resolved by removing `runAsNonRoot: true` and
# `allowPrivilegeEscalation: false`. Using --output-patch=true led me
# to a `kubectl patch` command to use.
#
# kubectl edit ds -n kube-system aws-node --output-patch=true
#
# - the kube-proxy deamonset's pods failed to pull the image, it was not
# found.
#
# This didn't need to be resolved mid way through upgrades, and was an
# issue that went away in k8s 1.23.
#
# 8. Update the version field in this config from 1.22 to 1.21
#
# 9. Upgraded the control plane, as in step 2.
#
# A. Upgraded add-ons, as in step 7.
#
# B. Update the version field in this config from 1.21 to 1.22
#
# C. Upgraded the control plane, as in step 2.
#
# D. Upgraded add-ons, as in step 7.
#
# E. I refreshed the ekscluster config's .jsonnet file based on
# template.jsonnet which has been updated to declare a addon related to ebs
# storage. In practice, this was probably not used later by subsequent
# commands I realize. It feels good to have it in the ekscluster config
# though to reflect adding it manually.
#
# addons: [
# {
# // aws-ebs-csi-driver ensures that our PVCs are bound to PVs that
# // couple to AWS EBS based storage, without it expect to see pods
# // mounting a PVC failing to schedule and PVC resources that are
# // unbound.
# //
# // Related docs: https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html
# //
# name: 'aws-ebs-csi-driver',
# wellKnownPolicies: {
# ebsCSIController: true,
# },
# },
# ],
#
# eksctl create iamserviceaccount \
# --name=ebs-csi-controller-sa \
# --namespace=kube-system \
# --cluster=carbonplanhub \
# --attach-policy-arn=arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
# --approve \
# --role-only \
# --role-name=AmazonEKS_EBS_CSI_DriverRole
#
# eksctl create addon --name=aws-ebs-csi-driver --cluster=carbonplanhub --service-account-role-arn=arn:aws:iam::631969445205:role/AmazonEKS_EBS_CSI_DriverRole --force
#
# F. Update the version field in this config from 1.22 to 1.23
#
# G. Upgraded the control plane, as in step 2.
#
# H. Upgraded add-ons, as in step 7.
#
# I. Update the version field in this config from 1.23 to 1.24
#
# J. Upgraded the control plane, as in step 2.
#
# K. Upgraded add-ons, as in step 7.
#
# L. I created a new core node pool and deleted the old, as in step 5-6.
#
# eksctl create nodegroup --config-file=eksctl-cluster-config.yaml --include "core-a" --install-nvidia-plugin=false
# eksctl delete nodegroup --config-file=eksctl-cluster-config.yaml --include "core-b" --approve
#
# M. I recreated all other nodegroups.
#
# eksctl create nodegroup --config-file=eksctl-cluster-config.yaml --include "nb-*,dask-*" --install-nvidia-plugin=false
#
The text was updated successfully, but these errors were encountered:
The oldest k8s version we use is now in openscapes, with k8s 1.21. Let's get it upgraded so that the oldest version becomes 1.22.
This issue was branched off from #2057.
I most recently upgraded an EKS cluster in #2085, and will use notes from there and adjust them as I go and iterate on them once again.
The text was updated successfully, but these errors were encountered: