add GPU ExtendedResourceToleration admission controller support #3181

lachie83 · 2018-06-05T22:17:18Z

Use-case - Users only want workloads that require GPU resources to be scheduled to nodes that have GPUs.

Adds the following support:

Adds ExtendedResourceToleration to the admission controller list - The allows for workloads with resource requires/limits of nvidia.com/gpu to be auto-tolerated by the admission controller to enable scheduling to the GPU node pool(s) - https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#extendedresourcetoleration
Adds the toleration to the nvidia-deviceplugin daemonset so that will only be scheduled to nodes with the supported GPU hardware

This reverts commit c5594e7.

sozercan · 2018-06-05T22:22:42Z

parts/k8s/kubernetesagentcustomdata.yml

@@ -137,6 +137,9 @@ write_files:
    KUBELET_IMAGE={{WrapAsVariable "kubernetesHyperkubeSpec"}}
    KUBELET_REGISTER_SCHEDULABLE=true
    KUBELET_NODE_LABELS={{GetAgentKubernetesLabels . "',variables('labelResourceGroup'),'"}}
+{{if IsNVIDIADevicePluginEnabled}}


i think this needs to be wrapped with {{if IsNSeriesSKU .}} so it won't apply to non-GPU pools

sozercan · 2018-06-05T22:22:51Z

parts/k8s/kubernetesbase.t

@@ -41,6 +41,9 @@
    {{range $index, $agent := .AgentPoolProfiles}}
        "{{.Name}}Index": {{$index}},
        {{template "k8s/kubernetesagentvars.t" .}}
+        {{if IsNVIDIADevicePluginEnabled }}


same as above

codecov · 2018-06-05T22:28:06Z

Codecov Report

Merging #3181 into master will decrease coverage by <.01%.
The diff coverage is 61.29%.

@@            Coverage Diff             @@
##           master    #3181      +/-   ##
==========================================
- Coverage   54.55%   54.55%   -0.01%     
==========================================
  Files         104      104              
  Lines       15753    15758       +5     
==========================================
+ Hits         8594     8596       +2     
- Misses       6409     6411       +2     
- Partials      750      751       +1

sozercan · 2018-06-06T01:32:20Z

parts/k8s/kubernetesagentcustomdata.yml

@@ -137,7 +137,7 @@ write_files:
    KUBELET_IMAGE={{WrapAsVariable "kubernetesHyperkubeSpec"}}
    KUBELET_REGISTER_SCHEDULABLE=true
    KUBELET_NODE_LABELS={{GetAgentKubernetesLabels . "',variables('labelResourceGroup'),'"}}
-{{if IsNVIDIADevicePluginEnabled}}
+{{if IsNSeriesSKU .}}


can you keep both of these checks? we want this to be enabled per GPU node and when NVIDIA device plugin enabled, but not enabled with k8s 1.8, 1.9 etc since were are not supporting device plugin in those version.
so it'll look something like:

{{if IsNSeriesSKU .}} {{if IsNVIDIADevicePluginEnabled}} ... {{end}} {{end}}

sozercan · 2018-06-06T01:32:28Z

parts/k8s/kubernetesbase.t

@@ -41,7 +41,7 @@
    {{range $index, $agent := .AgentPoolProfiles}}
        "{{.Name}}Index": {{$index}},
        {{template "k8s/kubernetesagentvars.t" .}}
-        {{if IsNVIDIADevicePluginEnabled }}
+        {{if IsNSeriesSKU .}}


same as above

sozercan · 2018-06-06T01:34:17Z

parts/k8s/addons/kubernetesmasteraddons-nvidia-device-plugin-daemonset.yaml

+      - key: nvidia.com/gpu
+        effect: NoSchedule
+        operator: Equal
+        value: "true"


Since you are adding the toleration, can you also replace the snippet from below at /etc/docker/daemon.json" in kubernetesagentcustomdata.yml since we want it to be enabled for GPU pools only but not for CPU pools.

}{{if IsNSeriesSKU .}}{{if IsNVIDIADevicePluginEnabled}} ,"default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }{{end}}{{end}}

BUGFIX - Update docker runtime on NON GPU enabled machines Add accelerator label to GPU enabled nodes Add nodeselector label match where accelerator=nvidia for nvidia-device-plugin

lachie83 · 2018-06-06T05:57:15Z

@sozercan this should be gtg. Tested against mixed gpu/cpu agent pool confirmed it works.

lachie83 · 2018-06-06T06:10:07Z

Actually. It looks like DNS resolution to kube-dns from a POD running on a GPU machine.

kubectl exec -it busybox-9f688c677-49568 /bin/sh
/ # nslookup www.google.com
Server:    10.0.0.10
Address 1: 10.0.0.10

^C
/ # exit
command terminated with exit code 130

lachie83 · 2018-06-06T06:24:55Z

Okay I've fixed that issue. I hadn't added the toleration to kube-proxy so that it runs on the gpu nodes.

lachie83 · 2018-06-06T06:26:58Z

Tested that it runs correctly without the host volume mounts in the pod spec.

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: samples-tf-mnist-demo
  name: samples-tf-mnist-demo
spec:
  template:
    metadata:
      labels:
        app: samples-tf-mnist-demo
    spec:
      containers:
      - name: samples-tf-mnist-demo
        image: microsoft/samples-tf-mnist-demo:gpu
        args: ["--max_steps", "500"]
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: OnFailure

kubectl logs samples-tf-mnist-demo-4msvj
2018-06-06 06:25:33.862588: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-06 06:25:34.125730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0ba3:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-06-06 06:25:34.125774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0ba3:00:00.0, compute capability: 3.7)
2018-06-06 06:25:39.073875: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
azureuser@k8s-master-11631163-0:~$ kubectl logs samples-tf-mnist-demo-4msvj
2018-06-06 06:25:33.862588: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-06 06:25:34.125730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0ba3:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-06-06 06:25:34.125774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0ba3:00:00.0, compute capability: 3.7)
2018-06-06 06:25:39.073875: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/input_data/t10k-labels-idx1-ubyte.gz
Accuracy at step 0: 0.1292
Accuracy at step 10: 0.7552
Accuracy at step 20: 0.8282
Accuracy at step 30: 0.8653
Accuracy at step 40: 0.8649
Accuracy at step 50: 0.8876
Accuracy at step 60: 0.8943
Accuracy at step 70: 0.9031
Accuracy at step 80: 0.9086
Accuracy at step 90: 0.9108
Adding run metadata for 99
Accuracy at step 100: 0.9166
Accuracy at step 110: 0.917
Accuracy at step 120: 0.9209
Accuracy at step 130: 0.9221
Accuracy at step 140: 0.9215
Accuracy at step 150: 0.9215
Accuracy at step 160: 0.9262
Accuracy at step 170: 0.9265
Accuracy at step 180: 0.9298
Accuracy at step 190: 0.9288
Adding run metadata for 199
Accuracy at step 200: 0.9362
Accuracy at step 210: 0.9327
Accuracy at step 220: 0.9349
Accuracy at step 230: 0.9336
Accuracy at step 240: 0.9365
Accuracy at step 250: 0.9395
Accuracy at step 260: 0.9389
Accuracy at step 270: 0.9422
Accuracy at step 280: 0.9416
Accuracy at step 290: 0.944
Adding run metadata for 299
Accuracy at step 300: 0.946
Accuracy at step 310: 0.9477
Accuracy at step 320: 0.9423
Accuracy at step 330: 0.9446
Accuracy at step 340: 0.9499
Accuracy at step 350: 0.9485
Accuracy at step 360: 0.943
Accuracy at step 370: 0.9515
Accuracy at step 380: 0.952
Accuracy at step 390: 0.9507
Adding run metadata for 399
Accuracy at step 400: 0.9487
Accuracy at step 410: 0.9495
Accuracy at step 420: 0.9523
Accuracy at step 430: 0.9494
Accuracy at step 440: 0.9505
Accuracy at step 450: 0.9484
Accuracy at step 460: 0.9538
Accuracy at step 470: 0.9546
Accuracy at step 480: 0.9522
Accuracy at step 490: 0.9531
Adding run metadata for 499

lachie83 · 2018-06-06T15:55:48Z

Let me also review the GPU docs to make sure they are good with these changes.

sozercan

Just tested, LGTM! 👍

sozercan · 2018-06-06T18:19:37Z

/assign @jackfrancis

jackfrancis · 2018-06-06T23:03:47Z

parts/k8s/kubernetesagentcustomdata.yml

@@ -34,14 +34,14 @@ write_files:
      "log-opts":  {
         "max-size": "50m",
         "max-file": "5"
-      }{{if IsNVIDIADevicePluginEnabled}}
+      }{{if IsNSeriesSKU .}}{{if IsNVIDIADevicePluginEnabled}}


Let's figure out how to simplify these guards.

are there scenarios where we don't want to install the nvidia device plugin on an N Series VM SKU node?

are there scenarios where we would ever want to install the plugin on a non-N Series VM SKU node?

I infer the answer to #2 is "no" based on the way this has been implemented here. For #1, if the answer to that is also "no", then we should combine these two checks: basically, we should get rid of IsNVIDIADevicePluginEnabled, or we should have IsNVIDIADevicePluginEnabled only return true if it's on an N Series node.

The answer to #1 is Yes. We basically didn't want to break backwards compatibility and leave clusters < 1.10 the way they are currently working. Once we no longer support k8s 1.9 then the answer to #1 is No.

Update file name to reflect nvidia-gpu Convert nodeSelector to affinity

Remove gpu from the resource name to fall in line with upstream

sozercan · 2018-06-07T19:30:34Z

pkg/acsengine/k8s_versions.go

+			"ratelimitbucket":                  k8sComponentVersions["1.11"]["ratelimitbucket"],
+			"gchighthreshold":                  k8sComponentVersions["1.11"]["gchighthreshold"],
+			"gclowthreshold":                   k8sComponentVersions["1.11"]["gclowthreshold"],
+			DefaultNVIDIADevicePluginAddonName: k8sComponentVersions["1.11"]["nvidia-device-plugin"],


just curious, does NDP v1.10 work in k8s 1.11? when i tested NDP v1.9 with k8s 1.10 a while back, it didn't work

It appears to work the same way it does in 1.10

zzh8829 · 2018-06-26T21:20:40Z

Any update on the status on this PR? Hybrid gpu/cpu cluster is still broken due to #3192

This reverts commit e91bc93.

This reverts commit c9e6dc1.

lachie83 · 2018-06-27T17:33:07Z

@zzh8829 fixing this up this morning and should have it ready to merge

jackfrancis · 2018-06-27T19:42:28Z

/lgtm

acs-bot · 2018-06-27T19:42:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, lachie83

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jackfrancis and others added 2 commits June 4, 2018 17:23

check for kubernetesConfig nil (Azure#3164)

c5594e7

add GPU ExtendedResourceToleration admission controller support

a36be0c

ghost assigned lachie83 Jun 5, 2018

ghost added the in progress label Jun 5, 2018

acs-bot added the size/M label Jun 5, 2018

Revert "check for kubernetesConfig nil (Azure#3164)"

f39f29b

This reverts commit c5594e7.

acs-bot added size/S and removed size/M labels Jun 5, 2018

lachie83 requested a review from sozercan June 5, 2018 22:19

lachie83 added the do-not-merge/work-in-progress label Jun 5, 2018

sozercan suggested changes Jun 5, 2018

View reviewed changes

acs-bot removed the do-not-merge/work-in-progress label Jun 5, 2018

Update condition to match review

e9f061e

sozercan suggested changes Jun 6, 2018

View reviewed changes

Update conditionals to match both N series and device plugin

2511e6f

BUGFIX - Update docker runtime on NON GPU enabled machines Add accelerator label to GPU enabled nodes Add nodeselector label match where accelerator=nvidia for nvidia-device-plugin

Add toleration to run kube-proxy on tainted gpu nodes

c9e6dc1

acs-bot added size/M and removed size/S labels Jun 6, 2018

lachie83 removed the in progress label Jun 6, 2018

sozercan mentioned this pull request Jun 6, 2018

fix #3190 only use nvidia-container-runtime on gpu nodes #3192

Closed

sozercan approved these changes Jun 6, 2018

View reviewed changes

acs-bot assigned jackfrancis Jun 6, 2018

Update Nvidia gpu device plugin to upstream ds example

9e8fa64

ghost added the in progress label Jun 6, 2018

jackfrancis reviewed Jun 6, 2018

View reviewed changes

lachie83 added 2 commits June 6, 2018 16:25

Update docs

50b46a4

Update file name to reflect nvidia-gpu Convert nodeSelector to affinity

Add support for 1.11

a8df245

Remove gpu from the resource name to fall in line with upstream

acs-bot added size/L and removed size/M labels Jun 7, 2018

cleanup k8s_version_test

83034c1

sozercan reviewed Jun 7, 2018

View reviewed changes

add nvidia.com/gpu to addons

e91bc93

lachie83 and others added 5 commits June 26, 2018 20:33

Revert "add nvidia.com/gpu to addons"

bbc2c7a

This reverts commit e91bc93.

remove register node with taints

2ea4097

Revert "Add toleration to run kube-proxy on tainted gpu nodes"

6b7af96

This reverts commit c9e6dc1.

Merge branch 'master' into feat-add-gpu-ExtendedResourceToleration

b1b087a

fixed rebase errors

478f39f

lachie83 and others added 3 commits June 27, 2018 10:49

fix formatting errors

474d0e6

fixed tests

cb1486e

actually check for nvidia addon enabled

be6ebd7

acs-bot added the lgtm label Jun 27, 2018

acs-bot added the approved label Jun 27, 2018

lachie83 removed the in progress label Jun 27, 2018

jackfrancis merged commit 09d6a8d into Azure:master Jun 27, 2018

jackfrancis pushed a commit that referenced this pull request Jun 27, 2018

add GPU ExtendedResourceToleration admission controller support (#3181)

f0997de

sozercan mentioned this pull request Aug 3, 2018

Hybrid clusters with GPU & CPU do not work #3190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add GPU ExtendedResourceToleration admission controller support #3181

add GPU ExtendedResourceToleration admission controller support #3181

lachie83 commented Jun 5, 2018 •

edited

Loading

sozercan Jun 5, 2018

sozercan Jun 5, 2018

codecov bot commented Jun 5, 2018 •

edited

Loading

sozercan Jun 6, 2018

sozercan Jun 6, 2018

sozercan Jun 6, 2018

lachie83 Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

sozercan left a comment

sozercan commented Jun 6, 2018

jackfrancis Jun 6, 2018

lachie83 Jun 6, 2018

sozercan Jun 7, 2018

lachie83 Jun 7, 2018

zzh8829 commented Jun 26, 2018

lachie83 commented Jun 27, 2018

jackfrancis commented Jun 27, 2018

acs-bot commented Jun 27, 2018

add GPU ExtendedResourceToleration admission controller support #3181

add GPU ExtendedResourceToleration admission controller support #3181

Conversation

lachie83 commented Jun 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 5, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

lachie83 commented Jun 6, 2018

sozercan left a comment

Choose a reason for hiding this comment

sozercan commented Jun 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zzh8829 commented Jun 26, 2018

lachie83 commented Jun 27, 2018

jackfrancis commented Jun 27, 2018

acs-bot commented Jun 27, 2018

lachie83 commented Jun 5, 2018 •

edited

Loading

codecov bot commented Jun 5, 2018 •

edited

Loading