Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Specifying 256GB instead of 128 for etcd disk #2435

Merged
merged 6 commits into from
Apr 2, 2018
Merged

Conversation

sylr
Copy link
Contributor

@sylr sylr commented Mar 12, 2018

128 is the limit of the P10 pricing tier which allocates 500 io/s per disk.

With disk like this the master nodes have a constant 30% IOWait load.

Etcd being overly needy in iops we need to change tier and 256 is the limit
of the next tier (P15) which allocates 1000 io/s.

See: https://azure.microsoft.com/en-us/pricing/details/managed-disks/

Fixes #2510

@jackfrancis
Copy link
Member

@khenidak thoughts here? Rather than bump up this default, is it a better approach to make this a moving target based on the SKU specs of the underlying master VMs?

@khenidak
Copy link
Contributor

Yes. We should up this default value. Id argue to go for 1023GB to get highest I/o specially for large clusters

@jackfrancis
Copy link
Member

Is 1023GB going to be compatible with all VM SKU sizes? And is it equally compatible with StorageAccount or ManagedDisk?

@khenidak
Copy link
Contributor

yes - disk size is not checked for premium/standard. only VM sku

also - I made a mistake. 2TB is the highest IO (7500 iops). 1TB(5000 iops)

@jackfrancis
Copy link
Member

So, does the value of resourceDiskSizeInMb for a VM SKU come into play? We whitelist VM SKUs based on characteristics that are enforced by a script. See here, for example, how we determine the minimum number of cores required for a master node:

https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/Get-AzureConstants.py#L41

Do we need to update that script to accommodate any increased storage requirements for masters that this change would introduce?

@khenidak
Copy link
Contributor

this are not related values AFIAK. one is data disk size and one ephemeral disk size (AKA resource disk). The resource disk size has direct relation to VM sku. however the data disk size has no relation to VM sku. as in i can have 2 core machine with 4 TB data disk size

@jackfrancis
Copy link
Member

Got it. So the actual change is more like:

before: "We sell you Kubernetes for cheap!"
after: "We sell you Kubernetes for not so cheap!"

@khenidak
Copy link
Contributor

while there is a difference in cost, i doubt that it is that high. storage is dirt cheap. However to your point maybe we should keep it at 256GB and have a section for high perf clusters tuning

@jackfrancis
Copy link
Member

What I'd like to do is deliver a simple "default threshold" beyond which we bump to 1TB. Do you have a gut feel for the node count after which we should bump a cluster up to 1TB?

@ghost ghost assigned jackfrancis Mar 21, 2018
@ghost ghost added the in progress label Mar 21, 2018
@jackfrancis
Copy link
Member

@khenidak @sylr See my recent commit to this PR. What do we think about that implementation? If we like it, I'll add test coverage.

@sylr
Copy link
Contributor Author

sylr commented Mar 21, 2018

Am I right to understand that the more agent the cluster has, the more ETCD will be solicited ?

Sylvain Rabot and others added 2 commits March 23, 2018 13:26
128 is the limit of the P10 pricing tier which allocates 500 io/s per disk.

With disk like this the master nodes have a constant 30% IOWait load.

Etcd being overly needy in iops we need to change tier and 256 is the limit
of the next tier (P15) which allocates 1000 io/s.

See: https://azure.microsoft.com/en-us/pricing/details/managed-disks/

Signed-off-by: Sylvain Rabot <[email protected]>
@jackfrancis
Copy link
Member

@sylr Roughly, yes. Enough correlation that it makes sense to nudge defaults upward if users don't provide a value.

@CecileRobertMichon
Copy link
Contributor

We should document this default pattern in the docs especially if it's likely to change pricing... Also I realized etcdDiskSize is missing from https://github.com/Azure/acs-engine/blob/master/docs/clusterdefinition.md#kubernetesconfig

// EtcdDiskSizeGT10Nodes = size for Kubernetes master etcd disk volumes in GB if > 10 nodes
EtcdDiskSizeGT10Nodes = "1024"
// EtcdDiskSizeGT20Nodes = size for Kubernetes master etcd disk volumes in GB if > 20 nodes
EtcdDiskSizeGT20Nodes = "2048"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think it would make sense to change naming to be DefaultEtcdDiskSize<Insert_Number_of_Nodes> since all of them are defaults that can be overridden by the user if I understand correctly. The name EtcdDiskSizeGT20Nodes + the comment makes it seem like it's static

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants