Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Drop "interface" mode in linkerd-cni (#242)
* Drop "interface" mode in linkerd-cni "interface" mode was ill-conceived, in that it wasn't providing full network capabilities as supposed, and so pods provisioned with linkerd-cni in this mode weren't having their network properly set up. This change removes that mode, leaving only the "chained" mode, where linkerd-cni waits for another CNI plugin to drop its config in the target directory, over which linkerd-cni will append its config. It's important to emphasize that no config will be generated until another CNI plugin adds its own, as to not trigger pod scheduling while there is no full CNI network config available. Also added proper logging timestamps. ## Tests This was successfully tested in k3d, AKS, EKS and GKE (with Calico). GKE with its default CNI still has issues as pointed in the Followup below. The following log is from having installed linkerd-cni in an node that already a CNI plugin running: ``` $ k -n linkerd-cni logs -f linkerd-cni-hznp2 install-cni [2023-05-17 08:50:46] Wrote linkerd CNI binaries to /host/home/kubernetes/bin [2023-05-17 08:50:46] Installing CNI configuration for /host/etc/cni/net.d/10-calico.conflist [2023-05-17 08:50:46] Using CNI config template from CNI_NETWORK_CONFIG environment variable. "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__", "k8s_api_root": "https://10.60.0.1:__KUBERNETES_SERVICE_PORT__", [2023-05-17 08:50:46] CNI config: { "name": "linkerd-cni", "type": "linkerd-cni", "log_level": "info", "policy": { "type": "k8s", "k8s_api_root": "https://10.60.0.1:443", "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig" }, "linkerd": { "incoming-proxy-port": 4143, "outgoing-proxy-port": 4140, "proxy-uid": 2102, "ports-to-redirect": [], "inbound-ports-to-ignore": ["4191","4190"], "simulate": false, "use-wait-flag": false } } [2023-05-17 08:50:47] Created CNI config /host/etc/cni/net.d/10-calico.conflist Setting up watches. Watches established. ``` After adding an extra node, the linkerd-cni DaemonSet starts and we can see it waits until another CNI plugin drops its config: ``` $ k -n linkerd-cni logs -f linkerd-cni-tvv6r [2023-05-17 08:58:12] Wrote linkerd CNI binaries to /host/home/kubernetes/bin [2023-05-17 08:58:12] No active CNI configuration files found Setting up watches. Watches established. [2023-05-17 08:58:22] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist [2023-05-17 08:58:22] New file [10-calico.conflist] detected; re-installing [2023-05-17 08:58:22] Using CNI config template from CNI_NETWORK_CONFIG environment variable. "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__", "k8s_api_root": "https://10.60.0.1:__KUBERNETES_SERVICE_PORT__", [2023-05-17 08:58:22] CNI config: { "name": "linkerd-cni", "type": "linkerd-cni", "log_level": "info", "policy": { "type": "k8s", "k8s_api_root": "https://10.60.0.1:443", "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig" }, "linkerd": { "incoming-proxy-port": 4143, "outgoing-proxy-port": 4140, "proxy-uid": 2102, "ports-to-redirect": [], "inbound-ports-to-ignore": ["4191","4190"], "simulate": false, "use-wait-flag": false } } [2023-05-17 08:58:22] Created CNI config /host/etc/cni/net.d/10-calico.conflist [2023-05-17 08:58:22] Detected change in /host/etc/cni/net.d/: DELETE 10-calico.conflist [2023-05-17 08:58:22] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist [2023-05-17 08:58:22] Ignoring event: CREATE /host/etc/cni/net.d/10-calico.conflist; no real changes detected ``` ## Followup This doesn't fix another class of problem where pods start to get scheduled after the first CNI plugin config is available, but before the linkerd-cni DaemonSet gets a chance to append its config. This results in the Pod not getting the iptables rules set in time. The injected linkerd-network-validator will catch that and fail the pod. To acquire proper network config, the pod needs to be externally bounced (manually or through an operator). This happens in GKE with its default CNI plugin when the node pool is scaled.
- Loading branch information