-
Notifications
You must be signed in to change notification settings - Fork 519
chore: default to large cluster settings on Azure Stack #2832
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2832 +/- ##
==========================================
+ Coverage 72.44% 72.47% +0.02%
==========================================
Files 140 140
Lines 25562 25589 +27
==========================================
+ Hits 18518 18545 +27
Misses 5976 5976
Partials 1068 1068 |
DefaultAzureStackKubernetesNodeStatusUpdateFrequency = "1m" | ||
DefaultAzureStackKubernetesCtrlMgrRouteReconciliationPeriod = "1m" | ||
DefaultAzureStackKubernetesCtrlMgrNodeMonitorGracePeriod = "5m" | ||
DefaultAzureStackKubernetesCtrlMgrPodEvictionTimeout = "1m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today default value is "--pod-eviction-timeout=5m0s". Lets better check why is that value ?. Before make it to 1m.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 399 in feecc8b
DefaultKubernetesCtrlMgrPodEvictionTimeout = "5m0s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, it aligns with the large clusters guideline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs is for all K8s version. New behavior is different from 1.14 which is officially supported on Azure Stack Hub.
We should check once with Jack if we need to update the large cluster guideline page. As it might be stale and may need some updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackfrancis is the guideline stale?
// Azure Stack configures all clusters as if they were large clusters. | ||
const ( | ||
DefaultAzureStackKubernetesCloudProviderBackoffRetries = 6 | ||
DefaultAzureStackKubernetesCloudProviderBackoffJitter = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults for 1.14.0 onward for cloudProviderBackoffMode is v2
| cloudProviderBackoffMode | no | Which version of the Azure cloudprovider backoff implementation to use: the options are `"v1"` or `"v2"` (Kubernetes v1.14.0 or greater only). `"v2"` is a more recent backoff implementation which better honors Azure API HTTP headers to align backoff timings with the Azure API. Defaults to `"v2"` for Kubernetes v1.14.0 and greater, and `"v1"` for earlier versions of Kubernetes. | |
Hence based on Azure Cloud provider code CloudProviderBackoffExponent and CloudProviderBackoffJitter is not needed to be set as it is no op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather keep both Azure and Azure Stack flows consistent.
Worst case seems to be a warning in the logs.
@@ -267,7 +267,7 @@ func (cs *ContainerService) setOrchestratorDefaults(isUpgrade, isScale bool) { | |||
} | |||
|
|||
// Enforce sane cloudprovider backoff defaults. | |||
o.KubernetesConfig.SetCloudProviderBackoffDefaults() | |||
a.SetCloudProviderBackoffDefaults() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, the eminently significant a
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Reason for Change:
Default AKSe settings do not play well with Azure Stack limits. This PR changes defaults for Azure Stack to reduce the load to ARM.
Requirements: