Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to provision new node #1180

Closed
alfianabdi opened this issue Jan 19, 2022 · 6 comments
Closed

Failed to provision new node #1180

alfianabdi opened this issue Jan 19, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@alfianabdi
Copy link

Version

Karpenter: v0.5.3

Kubernetes: v1.21.2-eks-0389ca3

Expected Behavior

A new node is provisioned when there are pending pods.

Actual Behavior

New node is not provisioned. The controller threw these errors

2022-01-19T02:44:37.678Z        DEBUG   controller.provisioning Excluding instance type m5a.2xlarge because there are not enough resources for daemons {"commit": "5047f3c", "provisioner": "linux-amd64"}
2022-01-19T02:44:37.680Z        ERROR   controller.provisioning Failed to find instance type option(s) for [default/inflate-6b88c9fb68-wlzxv default/inflate-6b88c9fb68-gbnmk] {"commit": "5047f3c", "provisioner": "linux-amd64"}

Steps to Reproduce the Problem

Create provisioner and deployment as in resource specs

Resource Specs and Logs

Provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: linux-amd64
spec:
  kubeletConfiguration: {}
  limits: {}
  provider:
    apiVersion: extensions.karpenter.sh/v1alpha1
    instanceProfile: <Redacted>
    kind: AWS
    launchTemplate: Linux-Amd64-WithRole
    securityGroupSelector:
      karpenter.sh/cluster/DevEKS: owned
      kubernetes.io/cluster/DevEKS: '*'
    subnetSelector:
      karpenter.sh/cluster/DevEKS: shared
  requirements:
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - m5.2xlarge
    - m5a.2xlarge
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - ap-southeast-1a
    - ap-southeast-1b
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  ttlSecondsAfterEmpty: 30
  ttlSecondsUntilExpired: 2592000

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1

Logs

2022-01-19T02:44:37.678Z        DEBUG   controller.provisioning Excluding instance type m5.2xlarge because there are not enough resources for daemons  {"commit": "5047f3c", "provisioner": "linux-amd64"}
2022-01-19T02:44:37.678Z        DEBUG   controller.provisioning Excluding instance type m5a.2xlarge because there are not enough resources for daemons {"commit": "5047f3c", "provisioner": "linux-amd64"}
2022-01-19T02:44:37.680Z        ERROR   controller.provisioning Failed to find instance type option(s) for [default/inflate-6b88c9fb68-wlzxv default/inflate-6b88c9fb68-gbnmk] {"commit": "5047f3c", "provisioner": "linux-amd64"}
2022-01-19T02:44:37.680Z        INFO    controller.provisioning Waiting for unschedulable pods  {"commit": "5047f3c", "provisioner": "linux-amd64"}
@alfianabdi alfianabdi added the bug Something isn't working label Jan 19, 2022
@prateekgogia
Copy link
Contributor

From the error logs it looks like its not able to fit all the daemon set pods running in the cluster on to the instance types allowed 2xlarge.
Can you please provide the output for kubectl get daemonsets -A

@alfianabdi
Copy link
Author

alfianabdi commented Jan 19, 2022

Hi @prateekgogia

NAMESPACE       NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                     AGE
calico-system   calico-node                21        21        21      21           21          kubernetes.io/os=linux                            194d
ingress-nginx   ingress-nginx-controller   8         8         7       8            7           kubernetes.io/arch=amd64,kubernetes.io/os=linux   309d
jbase           filebeat                   8         8         7       8            7           <none>                                            625d
kube-system     aws-node                   21        21        21      21           21          <none>                                            332d
kube-system     csi-smb-node-win           0         0         0       0            0           kubernetes.io/os=windows                          61d
kube-system     ebs-csi-node               21        21        21      21           21          kubernetes.io/os=linux                            332d
kube-system     efs-csi-node               21        21        21      21           21          beta.kubernetes.io/os=linux                       293d
kube-system     kube-proxy                 21        21        21      21           21          <none>                                            685d
kube-system     node-local-dns             21        21        21      21           21          <none>                                            156d
logging         fluent-bit-windows-1a      0         0         0       0            0           kubernetes.io/os=windows                          134d
logging         fluent-bit-windows-1b      0         0         0       0            0           kubernetes.io/os=windows                          134d
logging         infra-1a-fluentbit         10        10        10      10           10          <none>                                            149d
logging         infra-1b-fluentbit         11        11        11      11           11          <none>                                            149d
monitoring      prometheus-node-exporter   17        17        17      17           17          kubernetes.io/os=linux                            289d

All DS pods fit into the node when launching the node via other method.

@Moser-ss
Copy link

I have the same issue. At the moment my cluster has 2 nodes, one m5.xlarge and one m5.2xlarge, and karpenter keeps telling me that cannot add a new node.
I cannot understand this behavior, if my dameonsets can run in an m5.xlarge it should get several m5.xlarge

2022-01-19T17:25:12.184Z	DEBUG	controller.provisioning	Excluding instance type r3.8xlarge because there are not enough resources for daemons	{"commit": "7e79a67", "provisioner": "default"}
2022-01-19T17:25:12.184Z	DEBUG	controller.provisioning	Excluding instance type m5a.24xlarge because there are not enough resources for daemons	{"commit": "7e79a67", "provisioner": "default"}
2022-01-19T17:25:12.184Z	DEBUG	controller.provisioning	Excluding instance type c5a.4xlarge because there are not enough resources for daemons	{"commit": "7e79a67", "provisioner": "default"}
2022-01-19T17:25:12.185Z	DEBUG	controller.provisioning	Excluding instance type m5dn.16xlarge because there are not enough resources for daemons	{"commit": "7e79a67", "provisioner": "default"}
2022-01-19T17:25:12.185Z	DEBUG	controller.provisioning	Excluding instance type m5.24xlarge because there are not enough resources for daemons	{"commit": "7e79a67", "provisioner": "default"}

My Provisioner object

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  ttlSecondsAfterEmpty: 30
  requirements:
    - key: "topology.kubernetes.io/zone" 
      operator: In
      values: ["eu-west-1a"]
    - key: "karpenter.sh/capacity-type" 
      operator: In
      values: ["spot", "on-demand"]
  limits:
    resources:
      cpu: 50
  provider: 
    tags:
      cluster: eks-ci-1
    instanceProfile: KarpenterNodeInstanceProfile-eks-ci-1

@felix-zhe-huang
Copy link
Contributor

This is a known issue. Currently daemon sets are not correctly considered in provisioning nodes. A fix is coming.
Please follow this issue.
#1084

@alfianabdi
Copy link
Author

I see
My problem was because one of the daemonset has ephemeral-storage request, even though the size is smaller than the node disk size, looks like the controller cannot consider it.

@ellistarn
Copy link
Contributor

Closing in favor of #1084. We'll get you sorted out ASAP @alfianabdi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants