Skip to content

Commit

Permalink
Add upgrade instructions readme #543
Browse files Browse the repository at this point in the history
  • Loading branch information
stasmachship committed Dec 3, 2024
1 parent 0b2db10 commit cb501bc
Show file tree
Hide file tree
Showing 16 changed files with 188 additions and 222 deletions.
108 changes: 108 additions & 0 deletions CLUSTER_UPGRADE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Uprade MachShip Kubernetes Clusters


![MachShip](https://machship.com/wp-content/uploads/2021/05/[email protected])

![Kubernetes Logo](https://raw.githubusercontent.com/kubernetes-sigs/kubespray/master/docs/img/kubernetes-logo.png)

## Upgrade instruction
#### 1. Login to the upgrade runner host
Upgrading the cluster nodes requires direct SSH access to the Kubernetes nodes, therefore the process needs to run from a VM that is on the same network


```bash
ssh <your_user_name>@10.0.1.47
```

#### 2. Create a new Python environment
Create an isolated Python virtual environment for Kubespray to run
```bash
python3.10 -m venv ./kubespray-venv
cd kubespray-venv
```

#### 3. Clone the kubespray repository
Ask your manager for the access to the kubespray-fork repository
```bash
git clone https://github.com/machship/kubespray-fork.git kubespray
cd kubespray
```

#### 4. Installpython dependencies
```bash
pip install wheel
pip install -r requirements.txt
```

#### 5. Update the inventory files for the cluster to be upgraded
```bash
cd inventory/<cluster-name>
```

##### Update the hosts.yaml twith your username and id_rsa key
If you need to generate an ssh key on this host run the following command(do not use a passphrase)
```bash
ssh-keygen -t rsa
```

Example hosts.yaml to update
```
all:
hosts:
mel-dev-node1:
ansible_host: 10.0.12.50
ip: 10.0.12.50
access_ip: 10.0.12.50
ansible_user: <your_user_name>
ansible_ssh_private_key_file: '~/.ssh/<your_rsa_key>'
mel-dev-node2:
ansible_host: 10.0.12.51
ip: 10.0.12.51
access_ip: 10.0.12.51
ansible_user: <your_user_name>
ansible_ssh_private_key_file: '~/.ssh/<your_rsa_key>'
mel-dev-node3:
ansible_host: 10.0.12.52
ip: 10.0.12.52
access_ip: 10.0.12.52
ansible_user: <your_user_name>
ansible_ssh_private_key_file: '~/.ssh/<your_rsa_key>'
children:
kube_control_plane:
hosts:
mel-dev-node1:
mel-dev-node2:
mel-dev-node3:
kube_node:
hosts:
mel-dev-node1:
mel-dev-node2:
mel-dev-node3:
etcd:
hosts:
mel-dev-node1:
mel-dev-node2:
mel-dev-node3:
k8s_cluster:
children:
kube_control_plane:
kube_node:
```

#### 6. Update the cluster version in the k8s_cluster.yaml file
Look for the kube_version: v1.xx.x and update as necessary
```bash
cd cd group_vars/k8s_cluster
vi k8s_cluster.yaml file
```

#### 7. Run the update from the root directory of teh repository
Use tghe --check flag to run the command in a dry run mode to test out the upgrade first
```bash
cd ../../../..
ansible-playbook -i inventory/<cluster-name>/hosts.yaml --diff -b -K upgrade-cluster.yml --check
```
You can also use the --limit flag to run the upgrade for a particular node, e.g.:
```bash
ansible-playbook -i inventory/<cluster-name>/hosts.yaml --diff -b -K upgrade-cluster.yml --limit <node-1>,<node-2>
```
1 change: 0 additions & 1 deletion inventory/local/group_vars

This file was deleted.

10 changes: 0 additions & 10 deletions inventory/local/hosts.ini

This file was deleted.

8 changes: 5 additions & 3 deletions inventory/sample/group_vars/all/all.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ loadbalancer_apiserver_healthcheck_port: 8081

## There are some changes specific to the cloud providers
## for instance we need to encapsulate packets with some network plugins
## If set the possible values only 'external' after K8s v1.31.
## If set the possible values are either 'gce', 'aws', 'azure', 'openstack', 'vsphere', 'oci', or 'external'
## When openstack is used make sure to source in the openstack credentials
## like you would do when using openstack-client before starting the playbook.
# cloud_provider:

## When cloud_provider is set to 'external', you can set the cloud controller to deploy
Expand Down Expand Up @@ -73,8 +75,8 @@ loadbalancer_apiserver_healthcheck_port: 8081
# skip_http_proxy_on_os_packages: false

## Since workers are included in the no_proxy variable by default, docker engine will be restarted on all nodes (all
## pods will restart) when adding or removing workers. To override this behaviour by only including control plane nodes
## in the no_proxy variable, set below to true:
## pods will restart) when adding or removing workers. To override this behaviour by only including master nodes in the
## no_proxy variable, set below to true:
no_proxy_exclude_workers: false

## Certificate Management
Expand Down
15 changes: 1 addition & 14 deletions inventory/sample/group_vars/all/containerd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,8 @@
# containerd_grpc_max_recv_message_size: 16777216
# containerd_grpc_max_send_message_size: 16777216

# Containerd debug socket location: unix or tcp format
# containerd_debug_address: ""

# Containerd log level
# containerd_debug_level: "info"

# Containerd logs format, supported values: text, json
# containerd_debug_format: ""

# Containerd debug socket UID
# containerd_debug_uid: 0

# Containerd debug socket GID
# containerd_debug_gid: 0

# containerd_metrics_address: ""

# containerd_metrics_grpc_histogram: false
Expand All @@ -51,7 +38,7 @@
# capabilities: ["pull", "resolve"]
# skip_verify: false

# containerd_max_container_log_line_size: 16384
# containerd_max_container_log_line_size: -1

# containerd_registry_auth:
# - registry: 10.0.0.2:5000
Expand Down
27 changes: 0 additions & 27 deletions inventory/sample/group_vars/all/oci.yml
Original file line number Diff line number Diff line change
@@ -1,30 +1,3 @@
## When External Oracle Cloud Infrastructure is used, set these variables
## External OCI Cloud Controller Manager
## https://github.com/oracle/oci-cloud-controller-manager/blob/v1.29.0/manifests/provider-config-example.yaml
# external_oracle_auth_region: ""
# external_oracle_auth_tenancy: ""
# external_oracle_auth_user: ""
# external_oracle_auth_key: ""
# external_oracle_auth_passphrase: ""
# external_oracle_auth_fingerprint: ""
# external_oracle_auth_use_instance_principals: false

# external_oracle_compartment: ""
# external_oracle_vcn: ""
# external_oracle_load_balancer_subnet1: ""
# external_oracle_load_balancer_subnet2: ""
# external_oracle_load_balancer_security_list_management_mode: All
# external_oracle_load_balancer_security_lists: {}

# external_oracle_ratelimiter_qps_read: 20.0
# external_oracle_ratelimiter_bucket_read: 5
# external_oracle_ratelimiter_qps_write: 20.0
# external_oracle_ratelimiter_bucket_write: 5

# external_oracle_cloud_controller_image_repo: ghcr.io/oracle/cloud-provider-oci
# external_oracle_cloud_controller_image_tag: "v1.29.0"


## When Oracle Cloud Infrastructure is used, set these variables
# oci_private_key:
# oci_region_id:
Expand Down
4 changes: 2 additions & 2 deletions inventory/sample/group_vars/all/offline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# quay_image_repo: "{{ registry_host }}"

## Kubernetes components
# kubeadm_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubeadm"
# kubeadm_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"
# kubectl_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubectl"
# kubelet_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubelet"

Expand Down Expand Up @@ -82,7 +82,7 @@
# krew_download_url: "{{ files_repo }}/github.com/kubernetes-sigs/krew/releases/download/{{ krew_version }}/krew-{{ host_os }}_{{ image_arch }}.tar.gz"

## CentOS/Redhat/AlmaLinux
### For EL8, baseos and appstream must be available,
### For EL7, base and extras repo must be available, for EL8, baseos and appstream
### By default we enable those repo automatically
# rhel_enable_repos: false
### Docker / Containerd
Expand Down
3 changes: 3 additions & 0 deletions inventory/sample/group_vars/all/openstack.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@
# external_openstack_application_credential_id:
# external_openstack_application_credential_secret:

## The tag of the external OpenStack Cloud Controller image
# external_openstack_cloud_controller_image_tag: "v1.28.2"

## Tags for the Cinder CSI images
## registry.k8s.io/sig-storage/csi-attacher
# cinder_csi_attacher_image_tag: "v4.4.2"
Expand Down
12 changes: 6 additions & 6 deletions inventory/sample/group_vars/all/vsphere.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
# external_vsphere_version: "6.7u3"

## Tags for the external vSphere Cloud Provider images
## registry.k8s.io/cloud-pv-vsphere/cloud-provider-vsphere
# external_vsphere_cloud_controller_image_tag: "v1.31.0"
## registry.k8s.io/csi-vsphere/syncer
# vsphere_syncer_image_tag: "v3.3.1"
## gcr.io/cloud-provider-vsphere/cpi/release/manager
# external_vsphere_cloud_controller_image_tag: "latest"
## gcr.io/cloud-provider-vsphere/csi/release/syncer
# vsphere_syncer_image_tag: "v2.5.1"
## registry.k8s.io/sig-storage/csi-attacher
# vsphere_csi_attacher_image_tag: "v3.4.0"
## registry.k8s.io/csi-vsphere/driver
# vsphere_csi_controller: "v3.3.1"
## gcr.io/cloud-provider-vsphere/csi/release/driver
# vsphere_csi_controller: "v2.5.1"
## registry.k8s.io/sig-storage/livenessprobe
# vsphere_csi_liveness_probe_image_tag: "v2.6.0"
## registry.k8s.io/sig-storage/csi-provisioner
Expand Down
5 changes: 1 addition & 4 deletions inventory/sample/group_vars/etcd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,4 @@
# etcd_experimental_enable_distributed_tracing: false
# etcd_experimental_distributed_tracing_sample_rate: 100
# etcd_experimental_distributed_tracing_address: "localhost:4317"
# etcd_experimental_distributed_tracing_service_name: etcd

## The interval for etcd watch progress notify events
# etcd_experimental_watch_progress_notify_interval: 5s
# etcd_experimental_distributed_tracing_service_name: etcd
8 changes: 0 additions & 8 deletions inventory/sample/group_vars/k8s_cluster/addons.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,18 +96,10 @@ rbd_provisioner_enabled: false
# rbd_provisioner_storage_class: rbd
# rbd_provisioner_reclaim_policy: Delete

# Gateway API CRDs
gateway_api_enabled: false
# gateway_api_experimental_channel: false

# Nginx ingress controller deployment
ingress_nginx_enabled: false
# ingress_nginx_host_network: false
# ingress_nginx_service_type: LoadBalancer
# ingress_nginx_service_annotations:
# example.io/loadbalancerIPs: 1.2.3.4
# ingress_nginx_service_nodeport_http: 30080
# ingress_nginx_service_nodeport_https: 30081
ingress_publish_status_address: ""
# ingress_nginx_nodeselector:
# kubernetes.io/os: "linux"
Expand Down
36 changes: 13 additions & 23 deletions inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ kube_token_dir: "{{ kube_config_dir }}/tokens"
kube_api_anonymous_auth: true

## Change this to use another Kubernetes version, e.g. a current beta release
kube_version: v1.31.1
kube_version: v1.29.5

# Where the binaries will be downloaded.
# Note: ensure that you've enough disk space (about 1G)
Expand Down Expand Up @@ -140,7 +140,11 @@ kube_proxy_nodeport_addresses: >-
{%- endif -%}
# If non-empty, will use this string as identification instead of the actual hostname
# kube_override_hostname: {{ inventory_hostname }}
# kube_override_hostname: >-
# {%- if cloud_provider is defined and cloud_provider in ['aws'] -%}
# {%- else -%}
# {{ inventory_hostname }}
# {%- endif -%}

## Encrypting Secret Data at Rest
kube_encrypt_secret_data: false
Expand Down Expand Up @@ -258,7 +262,7 @@ default_kubelet_config_dir: "{{ kube_config_dir }}/dynamic_kubelet_dir"
# kubelet_runtime_cgroups_cgroupfs: "/system.slice/{{ container_manager }}.service"
# kubelet_kubelet_cgroups_cgroupfs: "/system.slice/kubelet.service"

# Whether to run kubelet and container-engine daemons in a dedicated cgroup.
# Optionally reserve this space for kube daemons.
# kube_reserved: false
## Uncomment to override default values
## The following two items need to be set when kube_reserved is true
Expand All @@ -268,7 +272,7 @@ default_kubelet_config_dir: "{{ kube_config_dir }}/dynamic_kubelet_dir"
# kube_cpu_reserved: 100m
# kube_ephemeral_storage_reserved: 2Gi
# kube_pid_reserved: "1000"
# Reservation for control plane hosts
# Reservation for master hosts
# kube_master_memory_reserved: 512Mi
# kube_master_cpu_reserved: 200m
# kube_master_ephemeral_storage_reserved: 2Gi
Expand Down Expand Up @@ -362,25 +366,11 @@ auto_renew_certificates: false
# First Monday of each month
# auto_renew_certificates_systemd_calendar: "Mon *-*-1,2,3,4,5,6,7 03:{{ groups['kube_control_plane'].index(inventory_hostname) }}0:00"

kubeadm_patches_dir: "{{ kube_config_dir }}/patches"
kubeadm_patches: []
# See https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#patches
# Correspondance with this link
# patchtype = type
# target = target
# suffix -> managed automatically
# extension -> always "yaml"
# kubeadm_patches:
# - target: kube-apiserver|kube-controller-manager|kube-scheduler|etcd|kubeletconfiguration
# type: strategic(default)|json|merge
# patch:
# metadata:
# annotations:
# example.com/test: "true"
# labels:
# example.com/prod_level: "{{ prod_level }}"
# - ...
# Patches are applied in the order they are specified.
# kubeadm patches path
kubeadm_patches:
enabled: false
source_dir: "{{ inventory_dir }}/patches"
dest_dir: "{{ kube_config_dir }}/patches"

# Set to true to remove the role binding to anonymous users created by kubeadm
remove_anonymous_access: false
Loading

0 comments on commit cb501bc

Please sign in to comment.