Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Fix kubeletConfig for Windows agent nodes #3753

Merged
merged 6 commits into from
Aug 30, 2018

Conversation

PatrickLang
Copy link
Contributor

@PatrickLang PatrickLang commented Aug 28, 2018

What this PR does / why we need it

Windows kubelet configs were fixed, and could not be overridden in the acs-engine apimodel. This meant that optional feature gates couldn't be enabled for Windows nodes.

Which issue this PR fixes: fixes #2627, #3266

Special notes for your reviewer:

Example apimodel.json:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.10.6",
      "kubernetesConfig": {
               "apiServerConfig" : {
                 "--feature-gates": "HyperVContainer=true"
               },
               "kubeletConfig" : {
                    "--feature-gates": "HyperVContainer=true"
                }
          }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "plang0607a",
      "vmSize": "Standard_D2_v3"
    },
    "agentPoolProfiles": [
	    {
		"name": "windowspool",
		"count": 2,
		"vmSize": "Standard_D2_v3",
		"availabilityProfile": "AvailabilitySet",
		"osType": "Windows",
		"osDiskSizeGB": 127,
		"extensions": [
		    {
                        "name": "winrm"
		    }
		]
	    },
	    {
		"name": "linuxpool",
		"count": 2,
		"vmSize": "Standard_D2_v2",
		"availabilityProfile": "AvailabilitySet",
		"osType": "Linux"
	    }
    ],
    "windowsProfile": {
	    "adminUsername": "",
	    "adminPassword": "",
	    "windowsPublisher": "MicrosoftWindowsServer",
	    "windowsOffer": "WindowsServerSemiAnnual",
	    "windowsSku": "Datacenter-Core-1803-with-Containers-smalldisk"
     },
    "extensionProfiles": [
      {
        "name": "winrm",
        "version": "v1"
      }
    ],
    "linuxProfile": {
      "adminUsername": "",
      "ssh": {
        "publicKeys": [
          {
            "keyData": ""
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "",
      "secret": ""
    }
  }
}

Once the cluster is deployed - I can run Windows Server version 1709 and 1803 pods successfully. 2016 is crashing for an unrelated issue that's a new bug.

I deployed these samples

kubectl get pod -o wide
NAME                           READY     STATUS             RESTARTS   AGE       IP             NODE
iis-797554d796-mzx54           1/1       Running            0          21m       10.240.0.129   26559k8s9001
whoami-1709-7bc765bdd-hh8w2    1/1       Running            0          21m       10.240.0.108   26559k8s9000
whoami-1803-78cf76cdb-qzlhk    1/1       Running            0          21m       10.240.0.146   26559k8s9001
whoami-2016-56f6d987c6-qrc24   0/1       CrashLoopBackOff   9          21m       <none>         26559k8s9000

$ kubectl get svc
NAME              TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
iis-svc           LoadBalancer   10.0.199.157   13.77.163.204    80:32230/TCP   21m
kubernetes        ClusterIP      10.0.0.1       <none>           443/TCP        26m
whoami-1709-svc   LoadBalancer   10.0.115.101   52.246.250.167   80:32058/TCP   21m
whoami-1803-svc   LoadBalancer   10.0.101.19    13.66.167.155    80:31961/TCP   21m
whoami-2016-svc   LoadBalancer   10.0.118.76    13.66.170.249    80:30238/TCP   21m

$ curl http://13.66.167.155
I'm 2073d5f95141 running on windows/amd64

$ curl http://52.246.250.167
I'm b5cba1e43131 running on windows/amd64

Release note:

`KubernetesConfig.kubeletConfig` now works for Windows nodes

@acs-bot
Copy link

acs-bot commented Aug 28, 2018

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: PatrickLang
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: cecilerobertmichon

If they are not already assigned, you can assign the PR to them by writing /assign @cecilerobertmichon in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ghost ghost assigned PatrickLang Aug 28, 2018
@ghost ghost added the in progress label Aug 28, 2018
@acs-bot acs-bot added the size/M label Aug 28, 2018
@PatrickLang PatrickLang mentioned this pull request Aug 28, 2018
3 tasks
@PatrickLang
Copy link
Contributor Author

PatrickLang commented Aug 28, 2018

Hmm, looks like I broke kubectl exec. Looks like I need to fix #3747 in the same PR.

$ kubectl exec iis-kubernetes-westus2-99158-55140-545c66f9f5-s5rc4 -n default -- powershell iwr -UseBasicParsing -TimeoutSec 60 www.bing.com
2018/08/28 00:53:58 Error trying to run 'kubectl exec':error: unable to upgrade connection: Forbidden (user=system:anonymous, verb=create, resource=nodes, subresource=proxy)

2018/08/28 00:53:58 Command:kubectl exec iis-kubernetes-westus2-99158-55140-545c66f9f5-s5rc4 -n default [-- powershell iwr -UseBasicParsing -TimeoutSec 60 www.bing.com] 
2018/08/28 00:53:58 Error:exit status 1
2018/08/28 00:53:58 Out:

@codecov
Copy link

codecov bot commented Aug 28, 2018

Codecov Report

Merging #3753 into master will increase coverage by 0.11%.
The diff coverage is 90.47%.

@@            Coverage Diff            @@
##           master   #3753      +/-   ##
=========================================
+ Coverage   55.48%   55.6%   +0.11%     
=========================================
  Files         108     108              
  Lines       16140   16160      +20     
=========================================
+ Hits         8955    8985      +30     
+ Misses       6420    6408      -12     
- Partials      765     767       +2

@acs-bot acs-bot added size/L and removed size/M labels Aug 30, 2018
@PatrickLang
Copy link
Contributor Author

@CecileRobertMichon or @jackfrancis - can you review tomorrow?

@PatrickLang
Copy link
Contributor Author

PatrickLang commented Aug 30, 2018

Resulting command line before:

c:\k\kubelet.exe 
  --allow-privileged=true 
  --azure-container-registry-config=c:\k\azure.json 
  --cgroups-per-qos=false 
  --cloud-config=c:\k\azure.json 
  --cloud-provider=azure 
  --cluster-dns=$global:KubeDnsServiceIp 
  --cluster-domain=cluster.local  
  --cni-bin-dir=c:\k\azurecni\bin 
  --cni-conf-dir=c:\k\azurecni\netconf
  --enable-debugging-handlers 
  --enforce-node-allocatable="" 
  --hairpin-mode=promiscuous-bridge 
  --hostname-override=$env:computername 
  --image-pull-progress-deadline=20m 
  --kubeconfig=c:\k\config 
  --network-plugin=cni 
  --pod-infra-container-image=kubletwin/pause 
  --resolv-conf="" 
  --runtime-request-timeout=10m  
  --v=2 
  --volume-plugin-dir=$global:VolumePluginDir 

After (with extra kubelet params passed):

c:\k\kubelet.exe 
  --address=0.0.0.0 
  --allow-privileged=true 
  --anonymous-auth=false 
  --authorization-mode=Webhook 
  --azure-container-registry-config=c:\k\azure.json 
  --cadvisor-port=0 
  --cgroups-per-qos=false 
  --client-ca-file=c:\k\ca.crt 
  --cloud-config=c:\k\azure.json 
  --cloud-provider=azure 
  --cluster-dns=$global:KubeDnsServiceIp 
  --cluster-dns=10.0.0.10 
  --cluster-domain=cluster.local 
  --cni-bin-dir=c:\k\azurecni\bin 
  --cni-conf-dir=c:\k\azurecni\netconf
  --enforce-node-allocatable="" 
  --event-qps=0 
  --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% 
  --feature-gates=HyperVContainer=true,PodPriority=true 
  --hairpin-mode=promiscuous-bridge 
  --hostname-override=$global:AzureHostname 
  --image-gc-high-threshold=85 
  --image-gc-low-threshold=80 
  --image-pull-progress-deadline=20m 
  --keep-terminated-pod-volumes=false 
  --kubeconfig=c:\k\config 
  --max-pods=30 
  --network-plugin=cni 
  --network-plugin=cni 
  --node-labels=$global:KubeletNodeLabels 
  --node-status-update-frequency=10s 
  --non-masquerade-cidr=10.240.0.0/12 
  --pod-infra-container-image=kubletwin/pause 
  --pod-max-pids=100 
  --resolv-conf="" 
  --volume-plugin-dir=$global:VolumePluginDir 

staticWindowsKubeletConfig["--azure-container-registry-config"] = "c:\\k\\azure.json"
staticWindowsKubeletConfig["--pod-infra-container-image"] = "kubletwin/pause"
staticWindowsKubeletConfig["--kubeconfig"] = "c:\\k\\config"
staticWindowsKubeletConfig["--cloud-config"] = "c:\\k\\azure.json"
staticWindowsKubeletConfig["--cgroups-per-qos"] = "false"
staticWindowsKubeletConfig["--enforce-node-allocatable"] = "\"\""
staticWindowsKubeletConfig["--enforce-node-allocatable"] = "\"\"\"\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like an escape room. You escape one just to find you must escape another.

@@ -136,6 +144,11 @@ func setKubeletConfig(cs *api.ContainerService) {
}
setMissingKubeletValues(profile.KubernetesConfig, o.KubernetesConfig.KubeletConfig)

if profile.OSType == "Windows" {
// Remove Linux-specific values
delete(profile.KubernetesConfig.KubeletConfig, "--pod-manifest-path")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha. I wonder if we should do this for Linux as well. Instead of this recent change that @tariq1890 made:

https://github.com/Azure/acs-engine/pull/3744/files

@jackfrancis
Copy link
Member

rekicked unit test

golang /lgtm

@PatrickLang
Copy link
Contributor Author

glad I did that last check.

  --cluster-dns=$global:KubeDnsServiceIp 
  --cluster-dns=10.0.0.10 

I should fix that one because that will break private vnet support later

@PatrickLang
Copy link
Contributor Author

/hold

@PatrickLang
Copy link
Contributor Author

PatrickLang commented Aug 30, 2018

/hold cancel

@jackfrancis jackfrancis merged commit 2fe9caa into Azure:master Aug 30, 2018
@ghost ghost removed the in progress label Aug 30, 2018
@PatrickLang PatrickLang deleted the patricklang-2627-squashed branch August 30, 2018 23:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hyper-v containers not working on kubernetes v1.10
3 participants