pod_anti_affinity does not force multiple nodes scheduling #4440

ochabaiev-boku · 2023-08-16T12:56:33Z

Description

Observed Behavior:
All StatefulSet's pods are scheduled onto one node

Expected Behavior:
Taking into account #942 pod_anti_affinity has to be taken into consideration and 3 StatefulSet's pods have to be scheduled on separate nodes.
If I'm missing something, please let me know.

ss            web-0                                 1/1     Running   0          14m     10.0.61.1     ip-10-0-41-173.eu-west-1.compute.internal           <none>           <none>
ss            web-1                                 1/1     Running   0          14m     10.0.39.32    ip-10-0-41-173.eu-west-1.compute.internal           <none>           <none>
ss            web-2                                 1/1     Running   0          14m     10.0.42.209   ip-10-0-41-173.eu-west-1.compute.internal           <none>           <none>

Reproduction Steps (Please include YAML):
provisioner:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  labels:
    provisioner: karpenter-default
  ttlSecondsAfterEmpty: 60 # scale down nodes after 60 seconds without workloads (excluding daemons)
  ttlSecondsUntilExpired: 604800 # expire nodes after 7 days (in seconds) = 7 * 60 * 60 * 24
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: [t3, c5, m5, r5]
    - key: karpenter.k8s.aws/instance-size
      operator: NotIn
      values: [nano, micro, small, large]
  limits:
    resources:
      cpu: 100
  providerRef:
    name: default

workload:

resource "kubernetes_namespace" "ns" {
    depends_on = [var.module_depends_on]

    metadata {
        name = "ss"
    }
}

resource "kubernetes_stateful_set" "web" {
    metadata {
        name      = "web"
        namespace = kubernetes_namespace.ns.metadata[0].name
    }
    spec {
        replicas = 3
        selector {
            match_labels = {
                app = "nginx"
            }
        }
        template {
            metadata {
                labels = {
                    app = "nginx"
                }
            }
            spec {
# Note: topology_spread_constraint + node selector is working, having a skew of 1 though
#                topology_spread_constraint {
#                    max_skew           = 1
#                    topology_key       = "topology.kubernetes.io/zone"
#                    when_unsatisfiable = "DoNotSchedule"
#                    label_selector {
#                        match_labels = {
#                            app = "nginx"
#                        }
#                    }
#                }
#                node_selector = var.node_selector
                affinity {
                    node_affinity {
                        required_during_scheduling_ignored_during_execution {
                            node_selector_term {
                                match_expressions {
                                    key      = "karpenter.sh/provisioner-name"
                                    operator = "In"
                                    values   = ["default"]
                                }
                            }
                        }
                    }
                    pod_anti_affinity {
                        required_during_scheduling_ignored_during_execution {
                            label_selector {
                                match_expressions {
                                    key      = "app"
                                    operator = "In"
                                    values   = ["nginx"]
                                }
                            }
                            topology_key = "topology.kubernetes.io/hostname"
                        }
                    }
                }
                container {
                    name  = "nginx"
                    image = "registry.k8s.io/nginx-slim:0.8"

                    port {
                        name           = "web"
                        container_port = 80
                    }
                    resources {
                        requests = {
                            cpu    = "50m"
                            memory = "64Mi"
                        }
                        limits = {
                            cpu    = "50m"
                            memory = "64Mi"
                        }
                    }

                    volume_mount {
                        name       = "www"
                        mount_path = "/usr/share/nginx/html"
                    }
                }
            }
        }
        volume_claim_template {
            metadata {
                name = "www"
            }

            spec {
                access_modes = ["ReadWriteOnce"]

                resources {
                    requests = {
                        storage = "1Gi"
                    }
                }
            }
        }
        service_name = "nginx"
    }
}

resource "kubernetes_service" "nginx" {
    metadata {
        name      = "nginx"
        namespace = kubernetes_namespace.ns.metadata[0].name
        labels    = {
            app = "nginx"
        }
    }
    spec {
        port {
            name = "web"
            port = 80
        }
        selector = {
            app = "nginx"
        }
        cluster_ip = "None"
    }
}

Versions:

Chart Version: v0.30.0-rc.0
Kubernetes Version (kubectl version):

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.16-eks-2d98532", GitCommit:"af930c12e26ef9d1e8fac7e3532ff4bcc1b2b509", GitTreeState:"clean", BuildDate:"2023-07-28T16:52:47Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

tzneal · 2023-08-16T13:02:08Z

Can you show the actual deployment spec from the API Server as yaml? kube-scheduler schedules pods to nodes, so unless it has the same bug as well, there's some issue with the pod spec.

ochabaiev-boku · 2023-08-16T13:05:31Z

@tzneal here's a StatefulSet's config:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  creationTimestamp: "2023-08-16T12:31:35Z"
  generation: 1
  name: web
  namespace: ss
  resourceVersion: "622769"
  uid: ba3bd7cc-ceb9-487a-a2d2-7560ac2aa3d7
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: nginx
  serviceName: nginx
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.sh/provisioner-name
                operator: In
                values:
                - default
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: topology.kubernetes.io/hostname
      automountServiceAccountToken: true
      containers:
      - image: registry.k8s.io/nginx-slim:0.8
        imagePullPolicy: IfNotPresent
        name: nginx
        ports:
        - containerPort: 80
          name: web
          protocol: TCP
        resources:
          limits:
            cpu: 50m
            memory: 64Mi
          requests:
            cpu: 50m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/nginx/html
          mountPropagation: None
          name: www
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      shareProcessNamespace: false
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: www
      namespace: default
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 3
  collisionCount: 0
  currentReplicas: 3
  currentRevision: web-596fd8747
  observedGeneration: 1
  readyReplicas: 3
  replicas: 3
  updateRevision: web-596fd8747
  updatedReplicas: 3

tzneal · 2023-08-16T13:08:45Z

topology.kubernetes.io/hostname should be kubernetes.io/hostname

tzneal · 2023-08-16T13:13:17Z

Talked on slack, I'll close this one, but feel free to re-open if you still run into issues.

ochabaiev-boku · 2023-08-16T13:14:21Z

thank you for a fast reply

ochabaiev-boku added the bug Something isn't working label Aug 16, 2023

tzneal closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pod_anti_affinity does not force multiple nodes scheduling #4440

pod_anti_affinity does not force multiple nodes scheduling #4440

ochabaiev-boku commented Aug 16, 2023

tzneal commented Aug 16, 2023

ochabaiev-boku commented Aug 16, 2023 •

edited

Loading

tzneal commented Aug 16, 2023

tzneal commented Aug 16, 2023

ochabaiev-boku commented Aug 16, 2023

pod_anti_affinity does not force multiple nodes scheduling #4440

pod_anti_affinity does not force multiple nodes scheduling #4440

Comments

ochabaiev-boku commented Aug 16, 2023

Description

tzneal commented Aug 16, 2023

ochabaiev-boku commented Aug 16, 2023 • edited Loading

tzneal commented Aug 16, 2023

tzneal commented Aug 16, 2023

ochabaiev-boku commented Aug 16, 2023

ochabaiev-boku commented Aug 16, 2023 •

edited

Loading