Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd Init Container Can Run Before Other Injected Init Containers #4758

Closed
theeternalrat opened this issue Jul 14, 2020 · 10 comments
Closed

Comments

@theeternalrat
Copy link

theeternalrat commented Jul 14, 2020

Bug Report

What is the issue?

Init containers cannot access network resources because the proxy has not started yet.

How can it be reproduced?

Create a pod with an init container with a simple command, such as wget google.com
Example:

initContainers:
        - name: init-ping
          image: busybox
          command: ['sh', '-c', "wget google.com"]

The pod will never initialize as the init containers cannot complete successfully without network availability. I've tried using both the Linkerd Init Container and the CNI, neither allow init containers to make network requests.

Logs, error output, etc

Here is the log from the wget

Connecting to google.com (216.58.217.46:80)
wget: can't connect to remote host (216.58.217.46): Connection refused

The connection is instantly refused because there is no network route through the proxy, as the proxy container has yet to start.

linkerd check output

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist

linkerd-cni-plugin
------------------
√ cni plugin ConfigMap exists
√ cni plugin PodSecurityPolicy exists
√ cni plugin ClusterRole exists
√ cni plugin ClusterRoleBinding exists
√ cni plugin Role exists
√ cni plugin RoleBinding exists
√ cni plugin ServiceAccount exists
√ cni plugin DaemonSet exists
√ cni plugin pod is running on all nodes

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-api
-----------
√ control plane pods are ready
√ control plane self-check
√ [kubernetes] control plane can talk to Kubernetes
√ [prometheus] control plane can talk to Prometheus
√ tap api service is running

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-addons
--------------
√ 'linkerd-config-addons' config map exists

linkerd-grafana
---------------
√ grafana add-on service account exists
√ grafana add-on config map exists
√ grafana pod is running

Status check results are √

Environment

  • Kubernetes Version:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
  • Cluster Environment: AWS EKS with non-managed worker nodes
  • Host OS: Amazon Linux 2
  • Linkerd version:
Client version: stable-2.8.1
Server version: stable-2.8.1

Possible solution

I am not familiar enough with the Linkerd codebase to propose a solution.

Additional context

I have only been able to find #1760 that references anything similar to my issue. Perhaps I am simply missing some configuration as I cannot believe so many run Linkerd without having init containers that require a network resource.

@ihcsim
Copy link
Contributor

ihcsim commented Jul 14, 2020

By default, the Linkerd proxy-init container is injected as the last init container on the list, so it shouldn't interfere with the networking of other init containers that come before it. I did a test with the following YAML:

cat <<EOF apiVersion: apps/v1 | k apply -f -
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
      labels:
        app: nginx
    spec:
      initContainers:
      - name: wget
        image: busybox
        command: ['sh', '-c', "wget google.com"]
      containers:
      - name: nginx
        image: nginx
        containerPorts:
        - name: http
          port: 80
EOF

It seems to work for me:

$ k logs nginx-78dc8645f6-7mjtl wget                                                   
Connecting to google.com (216.58.193.78:80)
Connecting to www.google.com (172.217.14.228:80)
saving to 'index.html'
index.html           100% |********************************| 12452  0:00:00 ETA
'index.html' saved

Can you check your proxy-init container log to see if there are any errors there?

@cpretzer
Copy link
Contributor

hi @atkinson137 can you provide detail about how the linkerd proxy was injected into your container? Did you use the linkerd.io/inject: enabled annotation? Or did you use the linkerd inject command?

I tested this by adding the busybox initContainer to the vote-bot deployment in https://run.linkerd.io/emojivoto.yml and the command ran successfully.

Can you confirm that the busybox command returns successfully when the pod is not injected with the linkerd proxy?

@theeternalrat
Copy link
Author

@cpretzer I can confirm the init container is successful without the linkerd.io/inject: enabled. I am using the annotation and not the cli command.

k logs orgchart-68d79ccf59-2ljkp init-ping
Connecting to google.com (172.217.14.206:80)
Connecting to www.google.com (172.217.14.228:80)
saving to 'index.html'
index.html           100% |********************************| 12864  0:00:00 ETA
'index.html' saved

@ihcsim I do not see any errors in the init container:

2020/07/15 16:22:43 Tracing this script execution as [1594830163]
2020/07/15 16:22:43 State of iptables rules before run:
2020/07/15 16:22:43 > iptables -t nat -vnL
2020/07/15 16:22:43 < Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

2020/07/15 16:22:43 > iptables -t nat -F PROXY_INIT_REDIRECT
2020/07/15 16:22:43 < iptables: No chain/target/match by that name.

2020/07/15 16:22:43 > iptables -t nat -X PROXY_INIT_REDIRECT
2020/07/15 16:22:43 < iptables: No chain/target/match by that name.

2020/07/15 16:22:43 Will ignore port(s) [4190 4191] on chain PROXY_INIT_REDIRECT
2020/07/15 16:22:43 Will redirect all INPUT ports to proxy
2020/07/15 16:22:43 > iptables -t nat -F PROXY_INIT_OUTPUT
2020/07/15 16:22:43 < iptables: No chain/target/match by that name.

2020/07/15 16:22:43 > iptables -t nat -X PROXY_INIT_OUTPUT
2020/07/15 16:22:43 < iptables: No chain/target/match by that name.

2020/07/15 16:22:43 Ignoring uid 2102
2020/07/15 16:22:43 Redirecting all OUTPUT to 4140
2020/07/15 16:22:43 Executing commands:
2020/07/15 16:22:43 > iptables -t nat -N PROXY_INIT_REDIRECT -m comment --comment proxy-init/redirect-common-chain/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PROXY_INIT_REDIRECT -p tcp --match multiport --dports 4190,4191 -j RETURN -m comment --comment proxy-init/ignore-port-4190,4191/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143 -m comment --comment proxy-init/redirect-all-incoming-to-proxy-port/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PREROUTING -j PROXY_INIT_REDIRECT -m comment --comment proxy-init/install-proxy-init-prerouting/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -N PROXY_INIT_OUTPUT -m comment --comment proxy-init/redirect-common-chain/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -o lo ! -d 127.0.0.1/32 -j PROXY_INIT_REDIRECT -m comment --comment proxy-init/redirect-non-loopback-local-traffic/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -j RETURN -m comment --comment proxy-init/ignore-proxy-user-id/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PROXY_INIT_OUTPUT -o lo -j RETURN -m comment --comment proxy-init/ignore-loopback/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140 -m comment --comment proxy-init/redirect-all-outgoing-to-proxy-port/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -A OUTPUT -j PROXY_INIT_OUTPUT -m comment --comment proxy-init/install-proxy-init-output/1594830163
2020/07/15 16:22:43 <
2020/07/15 16:22:43 > iptables -t nat -vnL
2020/07/15 16:22:43 < Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 PROXY_INIT_REDIRECT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/install-proxy-init-prerouting/1594830163 */

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 PROXY_INIT_OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/install-proxy-init-output/1594830163 */

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain PROXY_INIT_OUTPUT (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 PROXY_INIT_REDIRECT  all  --  *      lo      0.0.0.0/0           !127.0.0.1            owner UID match 2102 /* proxy-init/redirect-non-loopback-local-traffic/1594830163 */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            owner UID match 2102 /* proxy-init/ignore-proxy-user-id/1594830163 */
    0     0 RETURN     all  --  *      lo      0.0.0.0/0            0.0.0.0/0            /* proxy-init/ignore-loopback/1594830163 */
    0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/redirect-all-outgoing-to-proxy-port/1594830163 */ redir ports 4140

Chain PROXY_INIT_REDIRECT (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 RETURN     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 4190,4191 /* proxy-init/ignore-port-4190,4191/1594830163 */
    0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/redirect-all-incoming-to-proxy-port/1594830163 */ redir ports 4143

@theeternalrat
Copy link
Author

theeternalrat commented Jul 15, 2020

In today's testing, I was not able to reproduce the issue with my wget command in the original post. However, I can still see the issue when using the vault-agent-init container. Perhaps this is because it is injected after the linkerd init container as Vault also uses a modification webhook:

k logs orgchart-698f649fb8-4kg9p vault-agent-init
==> Vault server started! Log data will stream in below:

==> Vault agent configuration:

                     Cgo: disabled
               Log Level: info
                 Version: Vault v1.4.2

2020-07-15T16:27:53.726Z [INFO]  sink.file: creating file sink
2020-07-15T16:27:53.726Z [INFO]  sink.file: file sink configured: path=/home/vault/.vault-token mode=-rw-r-----
2020-07-15T16:27:53.726Z [INFO]  template.server: starting template server
2020/07/15 16:27:53.726928 [INFO] (runner) creating new runner (dry: false, once: false)
2020-07-15T16:27:53.726Z [INFO]  auth.handler: starting auth handler
2020-07-15T16:27:53.727Z [INFO]  auth.handler: authenticating
2020/07/15 16:27:53.727435 [INFO] (runner) creating watcher
2020-07-15T16:27:53.727Z [INFO]  sink.server: starting sink server
2020-07-15T16:27:53.821Z [ERROR] auth.handler: error authenticating: error="Put https://xxx/login: dial tcp xxx:443: connect: connection refused" backoff=2.560555955
2020-07-15T16:27:56.382Z [INFO]  auth.handler: authenticating
2020-07-15T16:27:56.476Z [ERROR] auth.handler: error authenticating: error="Put https://xxx/login: dial tcp xxx:443: connect: connection refused" backoff=1.8304010960000001

@cpretzer
Copy link
Contributor

cpretzer commented Jul 15, 2020

@atkinson137 Thanks for clarifying. So, yeah, this looks to be contention over which initContainer is injected last by its respective webhook.

Long-term, we think this will be resolved with the Kubernetes Sidecar Resource.

In the short term, you might be able to use the --skip-outbound-ports annotation for the vault-agent-init. This will cause iptables to skip proxying the traffic through the Linkerd proxy and send the request directly through to vault.

Let me know if this helps.

@theeternalrat theeternalrat changed the title Init Container Have No Network Availability Init Containers Have No Network Availability Jul 15, 2020
@theeternalrat
Copy link
Author

@cpretzer The containers all start properly when applying config.linkerd.io/skip-outbound-ports: "443" to the Deployment annotation. In my quick googling, I didn't find any way to apply the annotation to just the vault-init container and not the pod at large. I also looked for any way to arrange the order of the modification webhooks to no avail.

This is an ok fix for me as we are not using 443 for our applications at the moment and Vault is served over HTTPS.

@theeternalrat theeternalrat changed the title Init Containers Have No Network Availability Linkerd Init Container Can Run Before Other Injected Init Containers Jul 15, 2020
@theeternalrat
Copy link
Author

Thank you for your help @cpretzer and @ihcsim.

@saul-data
Copy link

I just experienced the same thing with checking a database is ready before spinning up the container...

It caused Skaffold deploy to fail:

  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
      labels:
        app: go-app-user-auth-service
    spec:
      initContainers:
        - name: check-db-ready
          image: gushcloud/postgresql-client:latest
          command: ['sh', '-c', 'until pg_isready -h timescaledb-service -p 5432; do echo waiting for database; sleep 2; done;']

image

@kleimkuhler
Copy link
Contributor

@saul-gush I think what's happening is command is still running when the proxy's init container starts and rewrites the IP tables. This causes it to fail as you see. I have an idea that may work, but I haven't tested it. Hopefully you can give it a try?

Take command and put it in the init container's preStop hook. I'm thinking it would look something like:

    spec:
      initContainers:
        - name: check-db-ready
          image: gushcloud/postgresql-client:latest
          lifecycle:
            preStop:
              exec:
                command: ['sh', '-c', 'until pg_isready -h timescaledb-service -p 5432; do echo waiting for database; sleep 2; done;']

If this works as I am expecting, check-db-ready's preStop hook will prevent other init containers from starting until it completes. That way, we know that check-db-ready has completed before the proxy's init container is created and the IP tables are rewritten.

@kleimkuhler
Copy link
Contributor

@saul-gush Never mind the previous comment. The behavior I was trying to get is already the default behavior of init containers: Each init container must complete successfully before the next one starts

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants