Running container-based workloads on large compute clusters will generally require every node to pull a copy of the container image from the container registry. However, many container images are very large, especially for deep learning or HPC development. Pulling many copies of the same large container can therefore lead to saturating the connection to the registry, especially when the registry is only reachable over the outbound Internet connection. If the registry is local, and the network connection is not the bottleneck, this can also lead to heavy load on the registry server itself!
In order to reduce this load, DeepOps includes a playbook to deploy a caching HTTP proxy based on rpardini/docker-registry-proxy. This proxy can be configured to cache container pulls from specific container registries, and caches containers on a per-layer basis. Following the first pull from an upstream container registry, subsequent pulls will only fetch from the proxy, reducing the number of pulls that need to hit the upstream registry.
Note that in order to successfully proxy HTTPS container registries, the caching proxy deployed by this playbook implements a "person-in-the-middle" HTTPS proxy. This requires the proxy to use its own Certificate Authority (CA) to generate certificates which masquerade as the upstream registry. The cluster nodes must then have the proxy's CA certificate added to their trusted store.
Because using this proxy requires that the nodes be configured to explicitly trust the proxy CA certificate, we believe this is a reasonable solution for a caching proxy. However, those using this feature should ensure this mechanism fits their security policy, and may choose to implement additional logging or auditing around the use of this proxy.
The full list of variables used by the caching proxy role can be found in roles/nginx-docker-registry-cache/defaults/main.yml.
The following variables are the most common configuration you may want to adjust:
Variable | Default value | Description |
---|---|---|
nginx_docker_cache_image |
"rpardini/docker-registry-proxy:0.6.1" |
Container image used to deploy the proxy |
nginx_docker_cache_registry_string |
"quay.io k8s.gcr.io gcr.io nvcr.io" |
Space-separated list of registries to proxy |
nginx_docker_cache_manifests |
"false" |
Flag to determine whether to cache image manifests |
nginx_docker_cache_manifest_default_time |
"1h" | If manifests are cached, time to cache them |
nginx_docker_cache_hostgroup |
"cache" |
Ansible inventory host group where proxy is deployed |
nginx_docker_cache_dockerd_clients |
true |
Flag to determine whether dockerd should be configured to use the proxy |
nginx_docker_cache_ca |
not configured by default | Specifies file paths for CA certificate and key, if you supply these yourself |
By default, the proxy will generate a CA certificate and key on its first run, and make the certificate available for clients to download. This is usually the fastest way to get up and running, but means that if you fully re-deploy the proxy server, you may need to re-download the CA certificate on the clients.
If you choose, you can instead provide a pre-generated CA certificate and key and specify these be used. A sample script for generating the key and certificate can be found in scripts/nginx-docker-cache/gen-ca.sh.
To specify the CA certificate and key which you wish to use, set the following variable:
nginx_docker_cache_ca:
- crt: "/path/to/ca.crt"
- key: "/path/to/ca.key"
This set of files will then be used for both the server and the clients.
To deploy the proxy server with the default configuration, add the host(s) where you wish to run the proxy to the cache
hostgroup in inventory.
Then run:
ansible-playbook -l cache playbooks/container/nginx-docker-registry-cache-server.yml
To configure client nodes using Docker for container pulls, ensure nginx_docker_cache_dockerd_clients
is set to true
, then run:
ansible-playbook -l <nodes> playbooks/container/nginx-docker-registry-cache-client.yml
To configure client nodes using Enroot for container pulls, add the following line to the enroot_config
variable:
https_proxy=http://<proxy-hostname>:3128/
Where <proxy_hostname>
is the name of the host where you're running the proxy.
Then run:
ansible-playbook -l <nodes> playbooks/container/nginx-docker-registry-cache-client.yml
ansible-playbook -l <nodes> playbooks/container/pyxis.yml