Pods with SG for pods are slow in ContainerCreating when a new node is deployed #1252

tomerpeer63 · 2022-02-01T07:49:27Z

Version

Karpenter Version: v0.5.4
Kubernetes: v1.21

Expected Behavior

When trying to scale new pods that has Security groups for pods, and a new node is created in Karpenter, the pod will get attached to the new node and will be deployed successfully

Actual Behavior

The new pods get attached to the new node, but they are stuck in ContainerCreating even after the node is ready for use. Only when the node is ready, and I delete/create the pods manually, the pods are able to run. This happens only with pods that use Security groups for pods.
If a node is already ready when new pods needs to deploy, this doesn't happen. Only when a node is coming up, and a pod is attached to it, the problem gets reproduced
I think this happens because Karpenter doesn't honer pod's schedule restrictions. Karpenter bind pod that requires vpc.amazonaws.com/pod-eni resource to node without vpc.amazonaws.com/pod-eni resource. However, the VPCResourceController will ignore such pod bind event since the node is not been managed yet (https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/v1.1.0/controllers/core/pod_controller.go#L100).

Steps to Reproduce the Problem

Use Karpenter with and have a deployment with SG for pods (VPC CNI), and try to deploy a new provisioner with new pods.

Resource Specs and Logs

2022-01-23T09:40:50, pod default/krakend-deployment-7bd59cd948-zzzrl was created, vpc-resource-controller's webhook modified it to have schedule restriction vpc.amazonaws.com/pod-eni: 1
2022-01-23T09:40:53, node ip-xxx.eu-central-1.compute.internal was created by karpenter ahead of time(before EC2 instance is been initialized)
2022-01-23T09:40:53: karpenter binds pod default/krakend-deployment-7bd59cd948-zzzrl to node ip-xxx.eu-central-1.compute.internal. (even when the node haven't advertised to have the vpc.amazonaws.com/pod-eni resource yet, and vpc-resource-controller will ignore this pod update since node is not managed by it yet)
2022-01-23T09:43:44, node ip-xxx.eu-central-1.compute.internal was attached trunk-ENI and patched by vpc-resource-controller to have vpc.amazonaws.com/pod-eni resources.
2022-01-23T10:04:39, pod default/krakend-deployment-7bd59cd948-zzzrl was deleted by replication-controller(deletionTimestamp set, but pod object isn't deleted)
2022-01-23T10:04:40, pod default/krakend-deployment-7bd59cd948-zzzrl was modified to have branch-ENI annotation after branchENI attached to node.

felix-zhe-huang · 2022-02-01T20:34:44Z

So in my environment it took about 2 minutes for the pod to get to running. Node is ready after 70 seconds. So the pod was stuck in the ContainerCreating state for about 1 minutes and 50 seconds. Most of the time is just waiting for the new eni to be created

.

felix-zhe-huang · 2022-02-01T20:43:07Z

IIUC, the k8s scheduler and karpenter are not aware of the pod security group concept. However, it should not cause incorrect scheduling and pod binding decisions. It is a matter of time for the node and the extra ENI to be ready for serving the new pods. The question is how long the delay is and why.

Can you help confirm that your pods can recover after the ENI is created? This will allow us to focus on the latency issue instead of the correctness issue. I use this command to check when the ENI is online. kubectl get nodes -o wide -l vpc.amazonaws.com/has-trunk-attached=true

ellistarn · 2022-02-01T22:48:15Z

Further, it's undesirable to flood the system with errors in the success case. At worst, those events can clog up etcd and slow the cluster down at scale. At best, it may trigger false alarms.

felix-zhe-huang · 2022-02-02T02:21:06Z

Apparently my above example is a lucky outliner that doesn't show how long the pods will be stuck in the ContainerCreating state. The VPC resource controller resync all pods every 30 minutes , and pods bound before the ENI is up and running will be recovered at the resync event. So in the worst case scenario pods will get stuck for 30 minutes.

Ideally the VPC resource controller should perform a resync when the node becomes ready. We will need the VPC resource controller team's help to implement that logic.

tomerpeer63 · 2022-02-02T10:59:09Z

@felix-zhe-huang I did experience what you said, sometimes the pods would just unstuck themselves and ran with irregular timing, of up to 30 minutes. Now it makes sense. I can confirm that whenever a node gets an ENI, pods are able to get into Running state, but as long as it was running before (or apparently after a sync happens)
Is there a way to go around it and bypass this issue?

github-actions · 2022-02-28T01:52:21Z

This issue is stale because it has been open 25 days with no activity. Remove stale label or comment or this will be closed in 5 days.

yanshwork · 2022-02-28T08:32:00Z

Please keep this issue open.

dewjam · 2022-04-01T18:23:55Z

Hello @tomerpeer63 ,
I found a workaround which is worth testing if you're open to it.

Add the vpc.amazonaws.com/has-trunk-attached: "false" label to your Karpenter Provisioner spec. As long as you have some instances-types which support ENI trunking in "requirements", you should see that pods using Security Groups will start much more quickly. We are actively working towards a more robust solution, but hopefully this will unblock you.

(Instance types which support ENI trunking (https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/aws/vpc/limits.go)

Here's an example Provisioner manifest:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  labels:
    vpc.amazonaws.com/has-trunk-attached: "false"
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
    - key: node.kubernetes.io/instance-type
      operator: In
      values: ["m5.2xlarge", "m5.xlarge"]
  limits:
    resources:
      cpu: 1000
  provider:
    subnetSelector:
      karpenter.sh/discovery: karpenter-demo
    securityGroupSelector:
      karpenter.sh/discovery: karpenter-demo
    tags:
      karpenter.sh/discovery: karpenter-demo
  ttlSecondsAfterEmpty: 30

ellistarn · 2022-04-01T18:25:17Z

@dewjam can we add this to Troubleshooting?

dewjam · 2022-04-04T17:34:04Z

Sure thing. Will work on that today.

dewjam · 2022-04-19T23:45:19Z

Added to troubleshooting guide: https://karpenter.sh/v0.8.2/troubleshooting/#pods-using-security-groups-for-pods-stuck-in-containercreating-state-for-up-to-30-minutes-before-transitioning-to-running

tomerpeer63 · 2022-04-20T09:29:14Z

Thanks. looks like the workaround is working

dewjam · 2022-04-26T14:44:40Z

Great, glad to hear the workaround is working @tomerpeer63 !

Quick update on our efforts towards a permanent fix. We were working towards implementing a fix in the aws-vpc-resource-controller (aws/amazon-vpc-resource-controller-k8s#103), but have shifted efforts to looking into removing the "early binding" capability Karpenter uses to assign pods to nodes. Without early binding, SGP would work as expected.

I'm in the process of validating this assumption by testing in some experimental code. I'll keep you in the loop.

tzneal · 2022-06-02T19:48:15Z

Closed with #1856

armenr · 2022-06-27T05:39:52Z

@tzneal - Does this mean that we should remove vpc.amazonaws.com/has-trunk-attached: "false" from the provisioner spec, and instead use the following with the 0.12.0 helm chart?

controller:
  env:
    AWS_ENABLE_POD_ENI: true

tzneal · 2022-06-27T13:00:18Z

Yes, this should work.

armenr · 2022-06-27T13:29:19Z

Going to wait for the next release before upgrading back to v0.12.x, but good to know about the provisioners!

AaronFriel · 2024-03-11T16:49:23Z

Should this be re-opened as pre-binding was reverted?

don't pre-bind pods to nodes (#1773) #1856

tomerpeer63 added the bug Something isn't working label Feb 1, 2022

ellistarn added the burning Time sensitive issues label Feb 1, 2022

felix-zhe-huang self-assigned this Feb 1, 2022

ellistarn changed the title ~~Pods with SG for pods are stuck in ContainerCreating when a new node is deployed~~ Pods with SG for pods are slow in ContainerCreating when a new node is deployed Feb 1, 2022

ellistarn added the blocked Unable to make progress due to some dependency label Feb 2, 2022

ellistarn removed the burning Time sensitive issues label Feb 2, 2022

github-actions bot added the stale label Feb 28, 2022

ellistarn added roadmap We're thinking about next steps and removed stale labels Feb 28, 2022

rtripat assigned tzneal and unassigned felix-zhe-huang Mar 3, 2022

ellistarn mentioned this issue Mar 8, 2022

Reconcile Security Groups for Pods aws/amazon-vpc-resource-controller-k8s#99

Closed

dewjam assigned dewjam and unassigned tzneal Mar 16, 2022

dewjam mentioned this issue Apr 5, 2022

adding SGP workaround to troubleshooting docs #1625

Merged

3 tasks

dewjam mentioned this issue Apr 15, 2022

Reconcile pods on node events aws/amazon-vpc-resource-controller-k8s#103

Closed

tzneal closed this as completed Jun 2, 2022

bwagner5 mentioned this issue Mar 15, 2023

Why did Karpenter remove the code about node creation and pods prebinding? #3590

Closed

GnatorX mentioned this issue Apr 29, 2024

Pod stuck in ContainerCreating status while waiting for an IP address to get assigned aws/amazon-vpc-cni-k8s#2892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods with SG for pods are slow in ContainerCreating when a new node is deployed #1252

Pods with SG for pods are slow in ContainerCreating when a new node is deployed #1252

tomerpeer63 commented Feb 1, 2022 •

edited

Loading

felix-zhe-huang commented Feb 1, 2022

felix-zhe-huang commented Feb 1, 2022 •

edited

Loading

ellistarn commented Feb 1, 2022

felix-zhe-huang commented Feb 2, 2022

tomerpeer63 commented Feb 2, 2022

github-actions bot commented Feb 28, 2022

yanshwork commented Feb 28, 2022

dewjam commented Apr 1, 2022

ellistarn commented Apr 1, 2022

dewjam commented Apr 4, 2022

dewjam commented Apr 19, 2022

tomerpeer63 commented Apr 20, 2022 •

edited

Loading

dewjam commented Apr 26, 2022

tzneal commented Jun 2, 2022

armenr commented Jun 27, 2022 •

edited

Loading

tzneal commented Jun 27, 2022

armenr commented Jun 27, 2022

AaronFriel commented Mar 11, 2024

Pods with SG for pods are slow in ContainerCreating when a new node is deployed #1252

Pods with SG for pods are slow in ContainerCreating when a new node is deployed #1252

Comments

tomerpeer63 commented Feb 1, 2022 • edited Loading

Version

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Resource Specs and Logs

felix-zhe-huang commented Feb 1, 2022

felix-zhe-huang commented Feb 1, 2022 • edited Loading

ellistarn commented Feb 1, 2022

felix-zhe-huang commented Feb 2, 2022

tomerpeer63 commented Feb 2, 2022

github-actions bot commented Feb 28, 2022

yanshwork commented Feb 28, 2022

dewjam commented Apr 1, 2022

ellistarn commented Apr 1, 2022

dewjam commented Apr 4, 2022

dewjam commented Apr 19, 2022

tomerpeer63 commented Apr 20, 2022 • edited Loading

dewjam commented Apr 26, 2022

tzneal commented Jun 2, 2022

armenr commented Jun 27, 2022 • edited Loading

tzneal commented Jun 27, 2022

armenr commented Jun 27, 2022

AaronFriel commented Mar 11, 2024

tomerpeer63 commented Feb 1, 2022 •

edited

Loading

felix-zhe-huang commented Feb 1, 2022 •

edited

Loading

tomerpeer63 commented Apr 20, 2022 •

edited

Loading

armenr commented Jun 27, 2022 •

edited

Loading