Add instance label to kubernetes_worker_antiaffinity #2988

mkhpalm · 2021-09-02T01:01:01Z

When running multiple teraslice clusters in a kubernetes cluster I think antiaffinity scheduling would work a bit better if confined to the teraslice cluster and/or job they were part of.

teraslice/packages/teraslice/lib/cluster/services/cluster/backends/kubernetes/k8sResource.js

Lines 127 to 153 in a454ebc

    
           _setAntiAffinity() { 
        
               if (this.terasliceConfig.kubernetes_worker_antiaffinity) { 
        
                   this.resource.spec.template.spec.affinity = { 
        
                       podAntiAffinity: { 
        
                           preferredDuringSchedulingIgnoredDuringExecution: [ 
        
                               { 
        
                                   weight: 1, 
        
                                   podAffinityTerm: { 
        
                                       labelSelector: { 
        
                                           matchExpressions: [ 
        
                                               { 
        
                                                   key: 'app.kubernetes.io/name', 
        
                                                   operator: 'In', 
        
                                                   values: [ 
        
                                                       'teraslice' 
        
                                                   ] 
        
                                               } 
        
                                           ] 
        
                                       }, 
        
                                       topologyKey: 'kubernetes.io/hostname' 
        
                                   } 
        
                               } 
        
                           ] 
        
                       } 
        
                   }; 
        
               } 
        
           }

godber · 2021-09-09T13:14:21Z

Ok, the pod labels look like this:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2021-09-09T13:01:13Z"
  generateName: ts-wkr-example-data-generator-job-0376463f-d2af-c588fc578-
  labels:
    app.kubernetes.io/component: worker
    app.kubernetes.io/instance: ts-dev1
    app.kubernetes.io/name: teraslice
    pod-template-hash: c588fc578
    teraslice.terascope.io/exId: 003ed22e-ee6f-4512-9ad7-7cf3a3e8419d
    teraslice.terascope.io/jobId: 0376463f-d2af-4e66-8257-069c89eaa5ef
    teraslice.terascope.io/jobName: example-data-generator-job
  name: ts-wkr-example-data-generator-job-0376463f-d2af-c588fc578-w44j6
  namespace: ts-dev1
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: ts-wkr-example-data-generator-job-0376463f-d2af-c588fc578
    uid: 9e615168-9bd6-4c4e-a48c-93f03acf9d52
  resourceVersion: "820485"
  uid: ac24b580-9d54-4e1d-9ae1-6bd2119761df

We're already matching app.kubernetes.io/name as you've shown, we should also match app.kubernetes.io/instance.

The "why" this will work better is that it should help avoid generating KubeletPlegDurationHigh alerts when there are short lived jobs executing at a high rate. Our scenario is that we have one workload that runs about 75 5s jobs four at a time. Every now and then one of the jobs will fail and we'll get the KubeletPlegDurationHigh alerts. In our case we let 4 jobs run at once, they all get scheduled on the "empty" node since they avoid other Teraslice workers. If they specifically avoided workers with the same instance, they shouldn't bunch up on one node like they do. Oh, actually we get KubeletPlegDurationHigh more frequently than the jobs actually fail.

That and we get a

execution dab86837-890f-4549-8528-e9fe4b1757d8 received shutdown before the slicer could complete, setting status to "terminated"

in the execution controller logs on the jobs that actually do fail.

refs #2988

This MR resolves the following issues: * BUG - Conflict between job target and cluster kubernetes_worker_antiaffinity setting - #2938 * Add instance label to pod antiaffinity - #2988

godber · 2021-09-11T00:09:48Z

My implementation added a second match label that matches the instance. Mike was actually suggesting I remove the teraslice label match and replace it with the instance.

This has reduced our PLEG alerts and resulted in the workers being spread more broadly. So I am going to close this as a win. But there is still a fair argument for removing the teraslice label match altogether.

godber added this to the Minor k8s improvements milestone Sep 9, 2021

godber self-assigned this Sep 9, 2021

godber added enhancement k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice labels Sep 9, 2021

godber added a commit that referenced this issue Sep 9, 2021

add instance label to pod antiafinity

59e461d

refs #2988

godber closed this as completed Sep 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add instance label to kubernetes_worker_antiaffinity #2988

Add instance label to kubernetes_worker_antiaffinity #2988

mkhpalm commented Sep 2, 2021

godber commented Sep 9, 2021

godber commented Sep 11, 2021

Add instance label to kubernetes_worker_antiaffinity #2988

Add instance label to kubernetes_worker_antiaffinity #2988

Comments

mkhpalm commented Sep 2, 2021

godber commented Sep 9, 2021

godber commented Sep 11, 2021