Specifying 256GB instead of 128 for etcd disk #2435

sylr · 2018-03-12T16:11:12Z

128 is the limit of the P10 pricing tier which allocates 500 io/s per disk.

With disk like this the master nodes have a constant 30% IOWait load.

Etcd being overly needy in iops we need to change tier and 256 is the limit
of the next tier (P15) which allocates 1000 io/s.

See: https://azure.microsoft.com/en-us/pricing/details/managed-disks/

Fixes #2510

jackfrancis · 2018-03-12T16:45:55Z

@khenidak thoughts here? Rather than bump up this default, is it a better approach to make this a moving target based on the SKU specs of the underlying master VMs?

khenidak · 2018-03-12T17:10:34Z

Yes. We should up this default value. Id argue to go for 1023GB to get highest I/o specially for large clusters

jackfrancis · 2018-03-16T22:51:44Z

Is 1023GB going to be compatible with all VM SKU sizes? And is it equally compatible with StorageAccount or ManagedDisk?

khenidak · 2018-03-16T22:53:20Z

yes - disk size is not checked for premium/standard. only VM sku

also - I made a mistake. 2TB is the highest IO (7500 iops). 1TB(5000 iops)

jackfrancis · 2018-03-16T22:59:52Z

So, does the value of resourceDiskSizeInMb for a VM SKU come into play? We whitelist VM SKUs based on characteristics that are enforced by a script. See here, for example, how we determine the minimum number of cores required for a master node:

https://github.com/Azure/acs-engine/blob/master/pkg/acsengine/Get-AzureConstants.py#L41

Do we need to update that script to accommodate any increased storage requirements for masters that this change would introduce?

khenidak · 2018-03-16T23:01:45Z

this are not related values AFIAK. one is data disk size and one ephemeral disk size (AKA resource disk). The resource disk size has direct relation to VM sku. however the data disk size has no relation to VM sku. as in i can have 2 core machine with 4 TB data disk size

jackfrancis · 2018-03-16T23:05:08Z

Got it. So the actual change is more like:

before: "We sell you Kubernetes for cheap!"
after: "We sell you Kubernetes for not so cheap!"

khenidak · 2018-03-16T23:07:26Z

while there is a difference in cost, i doubt that it is that high. storage is dirt cheap. However to your point maybe we should keep it at 256GB and have a section for high perf clusters tuning

jackfrancis · 2018-03-16T23:10:57Z

What I'd like to do is deliver a simple "default threshold" beyond which we bump to 1TB. Do you have a gut feel for the node count after which we should bump a cluster up to 1TB?

jackfrancis · 2018-03-21T16:08:38Z

@khenidak @sylr See my recent commit to this PR. What do we think about that implementation? If we like it, I'll add test coverage.

sylr · 2018-03-21T16:16:15Z

Am I right to understand that the more agent the cluster has, the more ETCD will be solicited ?

128 is the limit of the P10 pricing tier which allocates 500 io/s per disk. With disk like this the master nodes have a constant 30% IOWait load. Etcd being overly needy in iops we need to change tier and 256 is the limit of the next tier (P15) which allocates 1000 io/s. See: https://azure.microsoft.com/en-us/pricing/details/managed-disks/ Signed-off-by: Sylvain Rabot <[email protected]>

jackfrancis · 2018-04-02T19:32:49Z

@sylr Roughly, yes. Enough correlation that it makes sense to nudge defaults upward if users don't provide a value.

CecileRobertMichon · 2018-04-02T21:06:14Z

We should document this default pattern in the docs especially if it's likely to change pricing... Also I realized etcdDiskSize is missing from https://github.com/Azure/acs-engine/blob/master/docs/clusterdefinition.md#kubernetesconfig

CecileRobertMichon · 2018-04-02T21:09:33Z

pkg/acsengine/const.go

+	// EtcdDiskSizeGT10Nodes = size for Kubernetes master etcd disk volumes in GB if > 10 nodes
+	EtcdDiskSizeGT10Nodes = "1024"
+	// EtcdDiskSizeGT20Nodes = size for Kubernetes master etcd disk volumes in GB if > 20 nodes
+	EtcdDiskSizeGT20Nodes = "2048"


Here I think it would make sense to change naming to be DefaultEtcdDiskSize<Insert_Number_of_Nodes> since all of them are defaults that can be overridden by the user if I understand correctly. The name EtcdDiskSizeGT20Nodes + the comment makes it seem like it's static

CecileRobertMichon

lgtm

sylr force-pushed the etcd branch from 8970805 to f8fe14f Compare March 13, 2018 09:54

ghost assigned jackfrancis Mar 21, 2018

ghost added the in progress label Mar 21, 2018

jackfrancis mentioned this pull request Mar 23, 2018

High etcd 2.2.5 IO writes and CPU IOwait on really small Kubernetes cluster (3 nodes) #2510

Closed

Sylvain Rabot and others added 2 commits March 23, 2018 13:26

scale up default etcd disk size

8200584

jackfrancis force-pushed the etcd branch from 3344502 to 8200584 Compare March 23, 2018 20:29

jackfrancis added 2 commits March 28, 2018 16:33

bump ci

cd01bda

bump ci

24e0033

const and tests

67d5c06

CecileRobertMichon reviewed Apr 2, 2018

View reviewed changes

docs and const name change

3a34501

CecileRobertMichon approved these changes Apr 2, 2018

View reviewed changes

jackfrancis merged commit 3df500a into Azure:master Apr 2, 2018

ghost removed the in progress label Apr 2, 2018

CecileRobertMichon mentioned this pull request Apr 1, 2019

Question about size of etcd disk default size Azure/aks-engine#923

Closed

CecileRobertMichon mentioned this pull request Jul 2, 2019

feat: Set EtcdDiskSizeGB max default size to 1023 if target platform is Azure Stack Azure/aks-engine#1558

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying 256GB instead of 128 for etcd disk #2435

Specifying 256GB instead of 128 for etcd disk #2435

sylr commented Mar 12, 2018 •

edited by jackfrancis

Loading

jackfrancis commented Mar 12, 2018

khenidak commented Mar 12, 2018

jackfrancis commented Mar 16, 2018

khenidak commented Mar 16, 2018

jackfrancis commented Mar 16, 2018

khenidak commented Mar 16, 2018

jackfrancis commented Mar 16, 2018

khenidak commented Mar 16, 2018

jackfrancis commented Mar 16, 2018

jackfrancis commented Mar 21, 2018

sylr commented Mar 21, 2018

jackfrancis commented Apr 2, 2018

CecileRobertMichon commented Apr 2, 2018

CecileRobertMichon Apr 2, 2018

CecileRobertMichon left a comment

Specifying 256GB instead of 128 for etcd disk #2435

Specifying 256GB instead of 128 for etcd disk #2435

Conversation

sylr commented Mar 12, 2018 • edited by jackfrancis Loading

jackfrancis commented Mar 12, 2018

khenidak commented Mar 12, 2018

jackfrancis commented Mar 16, 2018

khenidak commented Mar 16, 2018

jackfrancis commented Mar 16, 2018

khenidak commented Mar 16, 2018

jackfrancis commented Mar 16, 2018

khenidak commented Mar 16, 2018

jackfrancis commented Mar 16, 2018

jackfrancis commented Mar 21, 2018

sylr commented Mar 21, 2018

jackfrancis commented Apr 2, 2018

CecileRobertMichon commented Apr 2, 2018

CecileRobertMichon Apr 2, 2018

Choose a reason for hiding this comment

CecileRobertMichon left a comment

Choose a reason for hiding this comment

sylr commented Mar 12, 2018 •

edited by jackfrancis

Loading