-
Notifications
You must be signed in to change notification settings - Fork 558
Add maxPods to Kubernetes configuration #1103
Conversation
Can one of the admins verify this patch? Say "@acs-bot test this please" to start tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an api-model example to examples/vnet/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good, but we need unit tests for any new functionality.
@acs-bot test this please |
@seanknox Any more feedback? |
40729d9
to
eed23f1
Compare
@@ -0,0 +1,9 @@ | |||
# Microsoft Azure Container Service Engine - Kubernetes Features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally change this path to kubernetes-custom-networking. Features is too generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the path to "kubernetes-config" because all of these settings are under orchestratorProfile.kubernetesConfig in the API model. I thought that was a good trade-off between being too generic (as in "features") vs. being too specific (as in "networking"). Please let me know if you disagree/have a proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a fair few snippets of information in the code that I feel should be verbalized in docs. I'm concerned the knowledge will be lost in the code. For example your arithmetic an the minmax pod setting. We should call out everything you need to consider when using Azure CNI
docs/clusterdefinition.md
Outdated
@@ -35,6 +35,7 @@ Here are the valid values for the orchestrator types: | |||
|clusterSubnet|no|The IP subnet used for allocating IP addresses for pod network interfaces. The subnet must be in the VNET address space. Default value is 10.244.0.0/16.| | |||
|dockerBridgeSubnet|no|The specific IP and subnet used for allocating IP addresses for the docker bridge network created on the kubernetes master and agents. Default value is 172.17.0.1/16. This value is used to configure the docker daemon using the [--bip flag](https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0).| | |||
|enableRbac|no|Enable [Kubernetes RBAC](https://kubernetes.io/docs/admin/authorization/rbac/) (boolean - default == false) | | |||
|maxPods|no|The maximum number of pods per node. The minimum valid value, necessary for running kube-system pods, is 5. Default value is 110.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also looks like the default is 30 if networkPolicy == azure. Please document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4aed84e
to
da4c75c
Compare
LGTM. Going to run e2e |
@ofiliz can you please rebase so we can get this merged. |
b4f97d3
to
11b187a
Compare
@lachie83 Rebased, again. Can we please get this merged today? These commits (as part of earlier PRs) are more than 3 months old now, and still waiting to get merged. |
// when VNET integration is enabled. It can be overridden per pool by setting the pool's IPAdddressCount property. | ||
DefaultAgentMultiIPAddressCount = 128 | ||
// DefaultKubernetesMaxPods is the maximum number of pods to run on a node. | ||
DefaultKubernetesMaxPods = 110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value consisent across all acs-engine supported k8s versions? (1.5
, 1.6
, and 1.7
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -14,7 +14,7 @@ | |||
}, | |||
"agentPoolProfiles": [ | |||
{ | |||
"name": "agent", | |||
"name": "agentpool1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this filename breaks our E2E tests, which consume them. Is there a good semantic reason to make a change? If so, we'll have to update our E2E test configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert them to "agent". Do you need all examples in kubernetes-config directory to use "agent" as pool name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the filename change that is breaking E2E configuration.
examples/custom-pod-cidr/kubernetes.json --> examples/kubernetes-config/kubernetes-clustersubnet.json
Do we mean to move the examples/custom-pod-cidr/kubernetes.json
file around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a previous CR feedback by @dmitsh. Instead of creating a new directory for every little KubernetesConfig knob we introduce, we are grouping them under one directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.K., @dmitsh's recommendations makes sense. We'll have to fix the E2E configs to point to the new files then. You can do that easily thusly:
In the following three files:
test/acse-conf/acse-feature-validation.json
test/acse-conf/acse-regression.json
test/e2e/kubernetes-deployments.json
Replace all "cluster_definition"
values of "custom-pod-cidr/kubernetes.json"
with the new relative filepath "kubernetes-config/kubernetes-clustersubnet.json"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new commit that updates the above files.
@@ -14,7 +14,7 @@ | |||
}, | |||
"agentPoolProfiles": [ | |||
{ | |||
"name": "agent", | |||
"name": "agentpool1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto comment above
parts/kubernetesagentresourcesvmas.t
Outdated
@@ -260,7 +260,7 @@ | |||
"autoUpgradeMinorVersion": true, | |||
"settings": {}, | |||
"protectedSettings": { | |||
"commandToExecute": "[concat('/usr/bin/nohup /bin/bash -c \"/bin/bash /opt/azure/containers/provision.sh ',variables('tenantID'),' ',variables('subscriptionId'),' ',variables('resourceGroup'),' ',variables('location'),' ',variables('subnetName'),' ',variables('nsgName'),' ',variables('virtualNetworkName'),' ',variables('routeTableName'),' ',variables('primaryAvailabilitySetName'),' ',variables('servicePrincipalClientId'),' ',variables('servicePrincipalClientSecret'),' ',variables('clientPrivateKey'),' ',variables('targetEnvironment'),' ',variables('networkPolicy'),' ',variables('cloudProviderBackoff'),' ',variables('cloudProviderBackoffRetries'),' ',variables('cloudProviderBackoffExponent'),' ',variables('cloudProviderBackoffDuration'),' ',variables('cloudProviderBackoffJitter'),' ',variables('cloudProviderRatelimit'),' ',variables('cloudProviderRatelimitQPS'),' ',variables('cloudProviderRatelimitBucket'),' ', variables('useManagedIdentityExtension'),' ',variables('useInstanceMetadata'),' >> /var/log/azure/cluster-provision.log 2>&1 &\" &')]" | |||
"commandToExecute": "[concat('/usr/bin/nohup /bin/bash -c \"/bin/bash /opt/azure/containers/provision.sh ',variables('tenantID'),' ',variables('subscriptionId'),' ',variables('resourceGroup'),' ',variables('location'),' ',variables('subnetName'),' ',variables('nsgName'),' ',variables('virtualNetworkName'),' ',variables('routeTableName'),' ',variables('primaryAvailablitySetName'),' ',variables('servicePrincipalClientId'),' ',variables('servicePrincipalClientSecret'),' ',variables('clientPrivateKey'),' ',variables('targetEnvironment'),' ',variables('networkPolicy'),' ',variables('maxPods'),' ',variables('cloudProviderBackoff'),' ',variables('cloudProviderBackoffRetries'),' ',variables('cloudProviderBackoffExponent'),' ',variables('cloudProviderBackoffDuration'),' ',variables('cloudProviderBackoffJitter'),' ',variables('cloudProviderRatelimit'),' ',variables('cloudProviderRatelimitQPS'),' ',variables('cloudProviderRatelimitBucket'),' ', variables('useManagedIdentityExtension'),' ',variables('useInstanceMetadata'),' >> /var/log/azure/cluster-provision.log 2>&1 &\" &')]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lost the change from this recent commit to master:
s/primaryAvailablitySetName/primaryAvailabilitySetName/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I missed it during the merge. Fixed.
parts/kubernetesmasterresources.t
Outdated
@@ -529,7 +529,7 @@ | |||
"autoUpgradeMinorVersion": true, | |||
"settings": {}, | |||
"protectedSettings": { | |||
"commandToExecute": "[concat('/usr/bin/nohup /bin/bash -c \"/bin/bash /opt/azure/containers/provision.sh ',variables('tenantID'),' ',variables('subscriptionId'),' ',variables('resourceGroup'),' ',variables('location'),' ',variables('subnetName'),' ',variables('nsgName'),' ',variables('virtualNetworkName'),' ',variables('routeTableName'),' ',variables('primaryAvailabilitySetName'),' ',variables('servicePrincipalClientId'),' ',variables('servicePrincipalClientSecret'),' ',variables('clientPrivateKey'),' ',variables('targetEnvironment'),' ',variables('networkPolicy'),' ',variables('cloudProviderBackoff'),' ',variables('cloudProviderBackoffRetries'),' ',variables('cloudProviderBackoffExponent'),' ',variables('cloudProviderBackoffDuration'),' ',variables('cloudProviderBackoffJitter'),' ',variables('cloudProviderRatelimit'),' ',variables('cloudProviderRatelimitQPS'),' ',variables('cloudProviderRatelimitBucket'),' ',variables('useManagedIdentityExtension'),' ',variables('useInstanceMetadata'),' ',variables('apiServerPrivateKey'),' ',variables('caCertificate'),' ',variables('caPrivateKey'),' ',variables('masterFqdnPrefix'),' ',variables('kubeConfigCertificate'),' ',variables('kubeConfigPrivateKey'),' ',variables('username'),' >> /var/log/azure/cluster-provision.log 2>&1\"')]" | |||
"commandToExecute": "[concat('/usr/bin/nohup /bin/bash -c \"/bin/bash /opt/azure/containers/provision.sh ',variables('tenantID'),' ',variables('subscriptionId'),' ',variables('resourceGroup'),' ',variables('location'),' ',variables('subnetName'),' ',variables('nsgName'),' ',variables('virtualNetworkName'),' ',variables('routeTableName'),' ',variables('primaryAvailablitySetName'),' ',variables('servicePrincipalClientId'),' ',variables('servicePrincipalClientSecret'),' ',variables('clientPrivateKey'),' ',variables('targetEnvironment'),' ',variables('networkPolicy'),' ',variables('maxPods'),' ',variables('cloudProviderBackoff'),' ',variables('cloudProviderBackoffRetries'),' ',variables('cloudProviderBackoffExponent'),' ',variables('cloudProviderBackoffDuration'),' ',variables('cloudProviderBackoffJitter'),' ',variables('cloudProviderRatelimit'),' ',variables('cloudProviderRatelimitQPS'),' ',variables('cloudProviderRatelimitBucket'),' ',variables('useManagedIdentityExtension'),' ',variables('useInstanceMetadata'),' ',variables('apiServerPrivateKey'),' ',variables('caCertificate'),' ',variables('caPrivateKey'),' ',variables('masterFqdnPrefix'),' ',variables('kubeConfigCertificate'),' ',variables('kubeConfigPrivateKey'),' ',variables('username'),' >> /var/log/azure/cluster-provision.log 2>&1\"')]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/primaryAvailablitySetName/primaryAvailabilitySetName/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Please double check.
@ofiliz comments above are blocking progress of this PR, I'm ready to merge once they're addressed. Let me know if you have any questions! |
What this PR does / why we need it:
This PR adds the feature described in issue #1102. It adds a "maxPods" field to KubernetesConfig to customize the maximum number of pods that can be scheduled on a node.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #1102This PR adds the optional "maxPods" field to KubernetesConfig. It defaults to 110 (the current Kubernetes default) when using spoofed IP addresses, and 30 when using actual IP addresses with CNI plugin enabled.
Customers can set custom maxPods values appropriate for their scenarios.
This is general goodness whether CNI plugin is used or not. However when CNI plugin is enabled, maxPods gives finer control over the number of IP addresses allocated per node, allowing larger cluster sizes with fewer addresses per node (e.g. 4096/110=37 vs. 4096/64=64 vs. 4096/30=136)
Special notes for your reviewer:
Release note:
fixes #1102
This change is