Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procedure to deploy registry HA #3400

Merged
merged 6 commits into from
Jun 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@
`systemd` cgroupDriver for Kubelet and containerd
(PR[#3377](https://github.com/scality/metalk8s/pull/3377))

- Allow to manually deploy a second registry container
(PR[#3400](https://github.com/scality/metalk8s/pull/3400))

### Breaking changes

- [#2199](https://github.com/scality/metalk8s/issues/2199) - Prometheus label
Expand Down
1 change: 1 addition & 0 deletions docs/operation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ do not have a working MetalK8s_ setup.
solutions
changing_node_hostname
metalk8s-utils
registry_ha
listening_processes
troubleshooting/index
sosreport
80 changes: 80 additions & 0 deletions docs/operation/registry_ha.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Registry HA
===========

To be able to run fully offline, MetalK8s comes with its own registry serving
all necessary images used by its containers.
This registry container sits on the Bootstrap node.

With a highly available registry, container images are served by multiple
nodes, which means the Bootstrap node can be lost without impacting the
cluster.
It allows pods to be scheduled, even if the needed images are not cached
locally.

.. note::

This procedure only talk about registry HA as Bootstrap HA is not
supported for the moment, so it's only a part of the Bootstrap
functionnaly. Check this ticket for more informations
https://github.com/scality/metalk8s/issues/2002

Prepare the node
----------------

To configure a node to host a registry, a ``repository`` pod must be scheduled
on it.
This node must be part of the MetalK8s cluster and no specific roles or
taints are needed.

All ISOs listed in the ``archives`` section of
``/etc/metalk8s/bootstrap.yaml`` and ``/etc/metalk8s/solutions.yaml``
must be copied from the Bootstrap node to the target node at exactly the same
location.

Deploy the registry
-------------------

Connect to the node where you want to deploy the registry and run the
following salt states

- Prepare all the MetalK8s ISOs

.. parsed-literal::

root@node-1 $ salt-call state.sls \\
metalk8s.archives.mounted \\
saltenv=metalk8s-|version|

- If you have some solutions, prepare the solutions ISOs

.. parsed-literal::

root@node-1 $ salt-call state.sls \\
metalk8s.solutions.available \\
saltenv=metalk8s-|version|

- Deploy the registry container

.. parsed-literal::

root@node-1 $ salt-call state.sls \\
metalk8s.repo.installed \\
saltenv=metalk8s-|version|


Reconfigure the container engines
---------------------------------

Containerd must be reconfigured to add the freshly deployed registry to its
endpoints and so it can still pull images in case the Bootstrap node's one is
down.

From the Bootstrap node, run (replace ``<bootstrap_node_name>`` with the
actual Bootstrap node name):

.. parsed-literal::

root@bootstrap $ kubectl exec -n kube-system -c salt-master \\
--kubeconfig=/etc/kubernetes/admin.conf \\
salt-master-<bootstrap_node_name> -- salt '*' state.sls \\
metalk8s.container-engine saltenv=metalk8s-|version|
36 changes: 31 additions & 5 deletions eve/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -269,9 +269,10 @@ models:
ARCHIVE_DIRECTORY: "%(prop:builddir)s/build"
ARCHIVE: "metalk8s.iso"
DEST: ''
NODE: bootstrap
command: >
scp -F ssh_config "$ARCHIVE_DIRECTORY/$ARCHIVE"
bootstrap:"$DEST"
$NODE:"$DEST"
workdir: *terraform_workdir
haltOnFailure: true
- ShellCommand: &create_mountpoint
Expand Down Expand Up @@ -456,7 +457,7 @@ models:
name: Run fast tests on Bastion
env: &_env_bastion_fast_tests
<<: *_env_bastion_tests
PYTEST_FILTERS: "post and ci and not multinode and not slow"
PYTEST_FILTERS: "post and ci and not multinode and not slow and not registry_ha"
- SetPropertyFromCommand: &set_bootstrap_cp_ip_ssh
name: Set the bootstrap node control plane IP as a property
property: bootstrap_control_plane_ip
Expand Down Expand Up @@ -647,7 +648,7 @@ models:
doStepIf: "%(prop:install_solution:-false)s"
name: Copy Solution archive to bootstrap
<<: *copy_iso_bootstrap_ssh
env:
env: &_env_copy_solution_archive_bootstrap_ssh
<<: *_env_copy_iso_bootstrap_ssh
ARCHIVE: "%(prop:solution_archive:-example-solution-1.0.0.iso)s"
- ShellCommand: &import_solution
Expand Down Expand Up @@ -2267,10 +2268,12 @@ stages:
workdir: *terraform_workdir
haltOnFailure: true
- ShellCommand: *set_bootstrap_minion_id_ssh
- ShellCommand:
- ShellCommand: &create_archive_directory
name: Create /archives directory
env:
NODE: "bootstrap"
command: >
ssh -F ssh_config bootstrap '
ssh -F ssh_config $NODE '
sudo mkdir /archives && sudo chown $USER: /archives
'
workdir: *terraform_workdir
Expand Down Expand Up @@ -2317,6 +2320,23 @@ stages:
- ShellCommand: *wait_for_solution_operator
# }}}
- ShellCommand: *wait_pods_stable_ssh
- ShellCommand:
<<: *create_archive_directory
env:
NODE: node-1
- ShellCommand:
<<: *copy_iso_bootstrap_ssh
name: Copy archive to node-1
env:
<<: *_env_copy_iso_bootstrap_ssh
NODE: node-1
DEST: /archives/metalk8s.iso
- ShellCommand:
<<: *copy_solution_archive_bootstrap_ssh
name: Copy Solution archive to node-1
env:
<<: *_env_copy_solution_archive_bootstrap_ssh
NODE: node-1
- ShellCommand: &multi_node_fast_tests
<<: *bastion_tests
name: Run fast tests on Bastion
Expand Down Expand Up @@ -2408,6 +2428,12 @@ stages:
PYTEST_FILTERS: "install and ci and multinodes"
- ShellCommand: *provision_volumes_on_node1
- ShellCommand: *wait_pods_stable_ssh
- ShellCommand:
<<: *copy_iso_bootstrap_ssh
name: Copy archive to node-1
env:
<<: *_env_copy_iso_bootstrap_ssh
NODE: node-1
- ShellCommand: *multi_node_fast_tests
- ShellCommand: *multi_node_slow_tests
- SetPropertyFromCommand:
Expand Down
24 changes: 13 additions & 11 deletions salt/_modules/metalk8s_kubernetes_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,17 +180,19 @@ def get_service_endpoints(service, namespace, kubeconfig):
raise CommandExecutionError(error_tpl.format(service, namespace)) from exc

try:
# Extract hostname, ip and node_name
result = {
k: v
for k, v in endpoint["subsets"][0]["addresses"][0].items()
if k in ["hostname", "ip", "node_name"]
}

# Add ports info to result dict
result["ports"] = {
port["name"]: port["port"] for port in endpoint["subsets"][0]["ports"]
}
result = []

for address in endpoint["subsets"][0]["addresses"]:
# Extract hostname, ip and node_name
res_ep = {
k: v for k, v in address.items() if k in ["hostname", "ip", "node_name"]
}

# Add ports info to result dict
res_ep["ports"] = {
port["name"]: port["port"] for port in endpoint["subsets"][0]["ports"]
}
result.append(res_ep)
except (AttributeError, IndexError, KeyError, TypeError) as exc:
raise CommandExecutionError(error_tpl.format(service, namespace)) from exc

Expand Down
17 changes: 13 additions & 4 deletions salt/_pillar/metalk8s_endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,19 +30,28 @@ def ext_pillar(minion_id, pillar, kubeconfig): # pylint: disable=unused-argumen

else:
endpoints = {}
errors = []

for namespace, services in services.items():
for service in services:
service_endpoints = []
try:
service_endpoints = __salt__[
"metalk8s_kubernetes.get_service_endpoints"
](service, namespace, kubeconfig)
except CommandExecutionError as exc:
service_endpoints = __utils__["pillar_utils.errors_to_dict"](
str(exc)
)
errors.append(str(exc))

# NOTE: This is needed for downgrade as this pillar
# is used for downgrade
# To be removed in 2.11
TeddyAndrieux marked this conversation as resolved.
Show resolved Hide resolved
if len(service_endpoints) == 1:
service_endpoints = service_endpoints[0]

endpoints.update({service: service_endpoints})
__utils__["pillar_utils.promote_errors"](endpoints, service)

if errors:
endpoints.update(__utils__["pillar_utils.errors_to_dict"](errors))

result = {"metalk8s": {"endpoints": endpoints}}

Expand Down
12 changes: 9 additions & 3 deletions salt/metalk8s/container-engine/containerd/installed.sls
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,14 @@
{%- from "metalk8s/map.jinja" import networks with context %}
{%- from "metalk8s/map.jinja" import proxies with context %}

{%- set registry_ip = metalk8s.endpoints['repositories'].ip %}
{%- set registry_port = metalk8s.endpoints['repositories'].ports.http %}
{%- set registry_eps = [] %}
{%- set pillar_endpoints = metalk8s.endpoints.repositories %}
{%- if not pillar_endpoints | is_list %}
{%- set pillar_endpoints = [pillar_endpoints] %}
{%- endif %}
{%- for ep in pillar_endpoints %}
{%- do registry_eps.append('"http://' ~ ep.ip ~ ":" ~ ep.ports.http ~ '"') %}
{%- endfor %}

include:
- metalk8s.repo
Expand Down Expand Up @@ -102,7 +108,7 @@ Configure registry IP in containerd conf:
version = 2

[plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{ repo.registry_endpoint }}"]
endpoint = ["http://{{ registry_ip }}:{{ registry_port }}"]
endpoint = [{{ registry_eps | join(",") }}]

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
Expand Down
3 changes: 1 addition & 2 deletions salt/metalk8s/orchestrate/apiserver.sls
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@ Check pillar on {{ node }}:
- tgt: {{ node }}
- kwarg:
keys:
- metalk8s.endpoints.repositories.ip
- metalk8s.endpoints.repositories.ports.http
- metalk8s.endpoints.repositories
# We cannot raise when using `salt.function` as we need to return
# `False` to have a failed state
# https://github.com/saltstack/salt/issues/55503
Expand Down
4 changes: 2 additions & 2 deletions salt/metalk8s/orchestrate/bootstrap/init.sls
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,12 @@ Get metalk8s:control_plane_ip grain:
'api': 4507,
},
},
'repositories': {
'repositories': [{
'ip': bootstrap_grains['control_plane_ip'],
'ports': {
'http': 8080,
},
},
}],
},
},
}
Expand Down
15 changes: 6 additions & 9 deletions salt/metalk8s/orchestrate/deploy_node.sls
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,8 @@ Check pillar before salt-minion configuration:
- tgt: {{ node_name }}
- kwarg:
keys:
- metalk8s.endpoints.salt-master.ip
- metalk8s.endpoints.repositories.ip
- metalk8s.endpoints.repositories.ports.http
- metalk8s.endpoints.salt-master
- metalk8s.endpoints.repositories
# We cannot raise when using `salt.function` as we need to return
# `False` to have a failed state
# https://github.com/saltstack/salt/issues/55503
Expand Down Expand Up @@ -195,9 +194,8 @@ Check pillar before etcd deployment:
- tgt: {{ node_name }}
- kwarg:
keys:
- metalk8s.endpoints.salt-master.ip
- metalk8s.endpoints.repositories.ip
- metalk8s.endpoints.repositories.ports.http
- metalk8s.endpoints.salt-master
- metalk8s.endpoints.repositories
# We cannot raise when using `salt.function` as we need to return
# `False` to have a failed state
# https://github.com/saltstack/salt/issues/55503
Expand Down Expand Up @@ -250,9 +248,8 @@ Check pillar before highstate:
- tgt: {{ node_name }}
- kwarg:
keys:
- metalk8s.endpoints.salt-master.ip
- metalk8s.endpoints.repositories.ip
- metalk8s.endpoints.repositories.ports.http
- metalk8s.endpoints.salt-master
- metalk8s.endpoints.repositories
# We cannot raise when using `salt.function` as we need to return
# `False` to have a failed state
# https://github.com/saltstack/salt/issues/55503
Expand Down
3 changes: 1 addition & 2 deletions salt/metalk8s/orchestrate/etcd.sls
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ Check pillar on {{ node }}:
- tgt: {{ node }}
- kwarg:
keys:
- metalk8s.endpoints.repositories.ip
- metalk8s.endpoints.repositories.ports.http
- metalk8s.endpoints.repositories
# We cannot raise when using `salt.function` as we need to return
# `False` to have a failed state
# https://github.com/saltstack/salt/issues/55503
Expand Down
3 changes: 1 addition & 2 deletions salt/metalk8s/orchestrate/upgrade/init.sls
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ Check pillar on {{ node }} before installing apiserver-proxy:
- tgt: {{ node }}
- kwarg:
keys:
- metalk8s.endpoints.repositories.ip
- metalk8s.endpoints.repositories.ports.http
- metalk8s.endpoints.repositories
# We cannot raise when using `salt.function` as we need to return
# `False` to have a failed state
# https://github.com/saltstack/salt/issues/55503
Expand Down
Loading