-
Notifications
You must be signed in to change notification settings - Fork 558
Specifying 256GB instead of 128 for etcd disk #2435
Conversation
@khenidak thoughts here? Rather than bump up this default, is it a better approach to make this a moving target based on the SKU specs of the underlying master VMs? |
Yes. We should up this default value. Id argue to go for 1023GB to get highest I/o specially for large clusters |
Is 1023GB going to be compatible with all VM SKU sizes? And is it equally compatible with StorageAccount or ManagedDisk? |
yes - disk size is not checked for premium/standard. only VM sku also - I made a mistake. 2TB is the highest IO (7500 iops). 1TB(5000 iops) |
So, does the value of https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/Get-AzureConstants.py#L41 Do we need to update that script to accommodate any increased storage requirements for masters that this change would introduce? |
this are not related values AFIAK. one is data disk size and one ephemeral disk size (AKA resource disk). The resource disk size has direct relation to VM sku. however the data disk size has no relation to VM sku. as in i can have 2 core machine with 4 TB data disk size |
Got it. So the actual change is more like: before: "We sell you Kubernetes for cheap!" |
while there is a difference in cost, i doubt that it is that high. storage is dirt cheap. However to your point maybe we should keep it at 256GB and have a section for high |
What I'd like to do is deliver a simple "default threshold" beyond which we bump to 1TB. Do you have a gut feel for the node count after which we should bump a cluster up to 1TB? |
Am I right to understand that the more agent the cluster has, the more ETCD will be solicited ? |
128 is the limit of the P10 pricing tier which allocates 500 io/s per disk. With disk like this the master nodes have a constant 30% IOWait load. Etcd being overly needy in iops we need to change tier and 256 is the limit of the next tier (P15) which allocates 1000 io/s. See: https://azure.microsoft.com/en-us/pricing/details/managed-disks/ Signed-off-by: Sylvain Rabot <[email protected]>
@sylr Roughly, yes. Enough correlation that it makes sense to nudge defaults upward if users don't provide a value. |
We should document this default pattern in the docs especially if it's likely to change pricing... Also I realized etcdDiskSize is missing from https://github.com/Azure/acs-engine/blob/master/docs/clusterdefinition.md#kubernetesconfig |
pkg/acsengine/const.go
Outdated
// EtcdDiskSizeGT10Nodes = size for Kubernetes master etcd disk volumes in GB if > 10 nodes | ||
EtcdDiskSizeGT10Nodes = "1024" | ||
// EtcdDiskSizeGT20Nodes = size for Kubernetes master etcd disk volumes in GB if > 20 nodes | ||
EtcdDiskSizeGT20Nodes = "2048" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think it would make sense to change naming to be DefaultEtcdDiskSize<Insert_Number_of_Nodes>
since all of them are defaults that can be overridden by the user if I understand correctly. The name EtcdDiskSizeGT20Nodes + the comment makes it seem like it's static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
128 is the limit of the P10 pricing tier which allocates 500 io/s per disk.
With disk like this the master nodes have a constant 30% IOWait load.
Etcd being overly needy in iops we need to change tier and 256 is the limit
of the next tier (P15) which allocates 1000 io/s.
See: https://azure.microsoft.com/en-us/pricing/details/managed-disks/
Fixes #2510