Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

add GPU ExtendedResourceToleration admission controller support #3181

Merged

Conversation

lachie83
Copy link
Member

@lachie83 lachie83 commented Jun 5, 2018

Use-case - Users only want workloads that require GPU resources to be scheduled to nodes that have GPUs.

Adds the following support:

@ghost ghost assigned lachie83 Jun 5, 2018
@ghost ghost added the in progress label Jun 5, 2018
@acs-bot acs-bot added the size/M label Jun 5, 2018
@@ -137,6 +137,9 @@ write_files:
KUBELET_IMAGE={{WrapAsVariable "kubernetesHyperkubeSpec"}}
KUBELET_REGISTER_SCHEDULABLE=true
KUBELET_NODE_LABELS={{GetAgentKubernetesLabels . "',variables('labelResourceGroup'),'"}}
{{if IsNVIDIADevicePluginEnabled}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this needs to be wrapped with {{if IsNSeriesSKU .}} so it won't apply to non-GPU pools

@@ -41,6 +41,9 @@
{{range $index, $agent := .AgentPoolProfiles}}
"{{.Name}}Index": {{$index}},
{{template "k8s/kubernetesagentvars.t" .}}
{{if IsNVIDIADevicePluginEnabled }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@codecov
Copy link

codecov bot commented Jun 5, 2018

Codecov Report

Merging #3181 into master will decrease coverage by <.01%.
The diff coverage is 61.29%.

@@            Coverage Diff             @@
##           master    #3181      +/-   ##
==========================================
- Coverage   54.55%   54.55%   -0.01%     
==========================================
  Files         104      104              
  Lines       15753    15758       +5     
==========================================
+ Hits         8594     8596       +2     
- Misses       6409     6411       +2     
- Partials      750      751       +1

@@ -137,7 +137,7 @@ write_files:
KUBELET_IMAGE={{WrapAsVariable "kubernetesHyperkubeSpec"}}
KUBELET_REGISTER_SCHEDULABLE=true
KUBELET_NODE_LABELS={{GetAgentKubernetesLabels . "',variables('labelResourceGroup'),'"}}
{{if IsNVIDIADevicePluginEnabled}}
{{if IsNSeriesSKU .}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you keep both of these checks? we want this to be enabled per GPU node and when NVIDIA device plugin enabled, but not enabled with k8s 1.8, 1.9 etc since were are not supporting device plugin in those version.
so it'll look something like:

{{if IsNSeriesSKU .}}
{{if IsNVIDIADevicePluginEnabled}}
...
{{end}}
{{end}}

@@ -41,7 +41,7 @@
{{range $index, $agent := .AgentPoolProfiles}}
"{{.Name}}Index": {{$index}},
{{template "k8s/kubernetesagentvars.t" .}}
{{if IsNVIDIADevicePluginEnabled }}
{{if IsNSeriesSKU .}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

- key: nvidia.com/gpu
effect: NoSchedule
operator: Equal
value: "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are adding the toleration, can you also replace the snippet from below at /etc/docker/daemon.json" in kubernetesagentcustomdata.yml since we want it to be enabled for GPU pools only but not for CPU pools.

      }{{if IsNSeriesSKU .}}{{if IsNVIDIADevicePluginEnabled}}
      ,"default-runtime": "nvidia",
      "runtimes": {
         "nvidia": {
             "path": "/usr/bin/nvidia-container-runtime",
             "runtimeArgs": []
        }
      }{{end}}{{end}} 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

BUGFIX - Update docker runtime on NON GPU enabled machines
Add accelerator label to GPU enabled nodes
Add nodeselector label match where accelerator=nvidia for nvidia-device-plugin
@lachie83
Copy link
Member Author

lachie83 commented Jun 6, 2018

@sozercan this should be gtg. Tested against mixed gpu/cpu agent pool confirmed it works.

@lachie83
Copy link
Member Author

lachie83 commented Jun 6, 2018

Actually. It looks like DNS resolution to kube-dns from a POD running on a GPU machine.

kubectl exec -it busybox-9f688c677-49568 /bin/sh
/ # nslookup www.google.com
Server:    10.0.0.10
Address 1: 10.0.0.10

^C
/ # exit
command terminated with exit code 130

@acs-bot acs-bot added size/M and removed size/S labels Jun 6, 2018
@lachie83
Copy link
Member Author

lachie83 commented Jun 6, 2018

Okay I've fixed that issue. I hadn't added the toleration to kube-proxy so that it runs on the gpu nodes.

@lachie83
Copy link
Member Author

lachie83 commented Jun 6, 2018

Tested that it runs correctly without the host volume mounts in the pod spec.

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: samples-tf-mnist-demo
  name: samples-tf-mnist-demo
spec:
  template:
    metadata:
      labels:
        app: samples-tf-mnist-demo
    spec:
      containers:
      - name: samples-tf-mnist-demo
        image: microsoft/samples-tf-mnist-demo:gpu
        args: ["--max_steps", "500"]
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: OnFailure
kubectl logs samples-tf-mnist-demo-4msvj
2018-06-06 06:25:33.862588: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-06 06:25:34.125730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0ba3:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-06-06 06:25:34.125774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0ba3:00:00.0, compute capability: 3.7)
2018-06-06 06:25:39.073875: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
azureuser@k8s-master-11631163-0:~$ kubectl logs samples-tf-mnist-demo-4msvj
2018-06-06 06:25:33.862588: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-06-06 06:25:34.125730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0ba3:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-06-06 06:25:34.125774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 0ba3:00:00.0, compute capability: 3.7)
2018-06-06 06:25:39.073875: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/input_data/t10k-labels-idx1-ubyte.gz
Accuracy at step 0: 0.1292
Accuracy at step 10: 0.7552
Accuracy at step 20: 0.8282
Accuracy at step 30: 0.8653
Accuracy at step 40: 0.8649
Accuracy at step 50: 0.8876
Accuracy at step 60: 0.8943
Accuracy at step 70: 0.9031
Accuracy at step 80: 0.9086
Accuracy at step 90: 0.9108
Adding run metadata for 99
Accuracy at step 100: 0.9166
Accuracy at step 110: 0.917
Accuracy at step 120: 0.9209
Accuracy at step 130: 0.9221
Accuracy at step 140: 0.9215
Accuracy at step 150: 0.9215
Accuracy at step 160: 0.9262
Accuracy at step 170: 0.9265
Accuracy at step 180: 0.9298
Accuracy at step 190: 0.9288
Adding run metadata for 199
Accuracy at step 200: 0.9362
Accuracy at step 210: 0.9327
Accuracy at step 220: 0.9349
Accuracy at step 230: 0.9336
Accuracy at step 240: 0.9365
Accuracy at step 250: 0.9395
Accuracy at step 260: 0.9389
Accuracy at step 270: 0.9422
Accuracy at step 280: 0.9416
Accuracy at step 290: 0.944
Adding run metadata for 299
Accuracy at step 300: 0.946
Accuracy at step 310: 0.9477
Accuracy at step 320: 0.9423
Accuracy at step 330: 0.9446
Accuracy at step 340: 0.9499
Accuracy at step 350: 0.9485
Accuracy at step 360: 0.943
Accuracy at step 370: 0.9515
Accuracy at step 380: 0.952
Accuracy at step 390: 0.9507
Adding run metadata for 399
Accuracy at step 400: 0.9487
Accuracy at step 410: 0.9495
Accuracy at step 420: 0.9523
Accuracy at step 430: 0.9494
Accuracy at step 440: 0.9505
Accuracy at step 450: 0.9484
Accuracy at step 460: 0.9538
Accuracy at step 470: 0.9546
Accuracy at step 480: 0.9522
Accuracy at step 490: 0.9531
Adding run metadata for 499

@lachie83
Copy link
Member Author

lachie83 commented Jun 6, 2018

Let me also review the GPU docs to make sure they are good with these changes.

Copy link
Member

@sozercan sozercan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested, LGTM! 👍

@sozercan
Copy link
Member

sozercan commented Jun 6, 2018

/assign @jackfrancis

@ghost ghost added the in progress label Jun 6, 2018
@@ -34,14 +34,14 @@ write_files:
"log-opts": {
"max-size": "50m",
"max-file": "5"
}{{if IsNVIDIADevicePluginEnabled}}
}{{if IsNSeriesSKU .}}{{if IsNVIDIADevicePluginEnabled}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out how to simplify these guards.

  1. are there scenarios where we don't want to install the nvidia device plugin on an N Series VM SKU node?
  2. are there scenarios where we would ever want to install the plugin on a non-N Series VM SKU node?

I infer the answer to #2 is "no" based on the way this has been implemented here. For #1, if the answer to that is also "no", then we should combine these two checks: basically, we should get rid of IsNVIDIADevicePluginEnabled, or we should have IsNVIDIADevicePluginEnabled only return true if it's on an N Series node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to #1 is Yes. We basically didn't want to break backwards compatibility and leave clusters < 1.10 the way they are currently working. Once we no longer support k8s 1.9 then the answer to #1 is No.

lachie83 added 2 commits June 6, 2018 16:25
Update file name to reflect nvidia-gpu
Convert nodeSelector to affinity
Remove gpu from the resource name to fall in line with upstream
@acs-bot acs-bot added size/L and removed size/M labels Jun 7, 2018
"ratelimitbucket": k8sComponentVersions["1.11"]["ratelimitbucket"],
"gchighthreshold": k8sComponentVersions["1.11"]["gchighthreshold"],
"gclowthreshold": k8sComponentVersions["1.11"]["gclowthreshold"],
DefaultNVIDIADevicePluginAddonName: k8sComponentVersions["1.11"]["nvidia-device-plugin"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, does NDP v1.10 work in k8s 1.11? when i tested NDP v1.9 with k8s 1.10 a while back, it didn't work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to work the same way it does in 1.10

@zzh8829
Copy link

zzh8829 commented Jun 26, 2018

Any update on the status on this PR? Hybrid gpu/cpu cluster is still broken due to #3192

@lachie83
Copy link
Member Author

@zzh8829 fixing this up this morning and should have it ready to merge

@jackfrancis
Copy link
Member

/lgtm

@acs-bot acs-bot added the lgtm label Jun 27, 2018
@acs-bot
Copy link

acs-bot commented Jun 27, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, lachie83

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants