kube-aws: Support Multi-AZ workers on AWS #439

mumoshu · 2016-04-27T03:13:53Z

One step forward to achieve high-availability throughout the cluster.
This change allows you to specify multiple subnets in cluster.yaml to make workers' ASG spread instances over those subnets. Differentiating each subnet's availability zone results in H/A of workers.
Beware that this change itself does nothing with H/A of masters.

Possibly relates to #147, #100

How I have tested this

I have tested this running ./build && rm -rf cluster.yaml userdata/ credentials/ && bin/kube-aws init *several options* && bin/kube-aws render && bin/kube-aws up with with/without the newly added subnets section in cluster.yaml.

For example, given a cluster.yaml:

*snip*

subnets:
- instanceCIDR: "10.0.0.0/24"
  availabilityZone: ap-northeast-1a
- instanceCIDR: "10.0.1.0/24"
  availabilityZone: ap-northeast-1c

After modifying worker ASG's desired capacity to 2 result in 2 worker nodes:

$ kubectl --kubeconfig=kubeconfig get nodes
NAME                                            STATUS                     AGE
ip-10-0-0-125.ap-northeast-1.compute.internal   Ready                      10m
ip-10-0-0-50.ap-northeast-1.compute.internal    Ready,SchedulingDisabled   17m
ip-10-0-1-51.ap-northeast-1.compute.internal    Ready                      17m

spreading across availability zone(this time they are ap-northeast-1a and ap-northeast-1c)

rafamonteiro · 2016-04-28T17:26:51Z

Just gave it a try and worked perfectly 👍

aaronlevy · 2016-05-03T02:06:29Z

multi-node/aws/pkg/config/config.go

+				i,
+			)
+		}
+		if i == 0 && !instancesNet.Contains(controllerIPAddr) {


If we have the convention of first subnet must include controllerIP, we should document it (maybe in config).

Thanks for your comment!
I agree that we are better documenting it for now.

Also let me suggest that we should remove it when we introduce an auto-scaling group to manage controller(s) and an ELB pointing to it.

This "convention" remains here because we currently have a singleton controller(which has controllerIP pointing to it) and it is placed in the first subnet "for now"(choosing one of subnets for the singleton controller doesn't make sense but I have proceeded with this convention for now).

If we had an auto-scaling group managing controller(s), we would configure the group to spread EC2 instances evenly over multiple subnets(as I had done for workers in this PR).
Then, we don't need to choose first/second/last subnet to place the singleton controller(because AWS auto-scaling does the job for us) or to specify/validate controllerIP(because having multiple controllers, we will point the master from workers via DNS name of an load balancer or Route53 DNS name pointing to it).

Does this make sense to you? @aaronlevy @colhom

Yup, longer-term plan sounds good to me.

My concern with documenting this in the config.yaml is that it's not immediately clear what is required of the networking configs (e.g. first subnet must contain controllerIP).

However, I just checked the existing config.yaml, and looks like we're not being explicit about the any of the networking expectations - so this is probably something I can raise as a separate issue.

I have the same feeling about your concern .
Also, at this point, a separate issue sounds O.K. for me to keep us proceeding incrementally.

aaronlevy · 2016-05-03T02:10:37Z

In general the code looks good. I'm not familiar enough with the upstream "ubernetes-lite" proposal and how well this would work with that (deferring to @colhom )

colhom · 2016-05-03T02:58:52Z

@mumoshu I've read through the kube-up segment that deals with this, and AFAICT the process is the same there as you have it here. I'm going to kick off an conformance test tonight for this.

federation-lite proposal

colhom · 2016-05-04T01:39:09Z

multi-node/aws/pkg/config/config.go

@@ -108,6 +118,12 @@ type Cluster struct {
 	RecordSetTTL             int               `yaml:"recordSetTTL"`
 	HostedZone               string            `yaml:"hostedZone"`
 	StackTags                map[string]string `yaml:"stackTags"`
+	Subnets                  []Subnet          `yaml:"subnets"`


This field (and it's subfields) need to be added to pkg/config/templates/cluster.yaml

I have added documentation in the commit 7b58131
Would you mind reviewing it? 🙇

colhom · 2016-05-04T01:44:45Z

@mumoshu this is a breaking change, as instanceCIDR is no longer supported. That information is now expressed as the first element of the subnets array. I like this format more, but I hesitate to merge breaking changes.

If we were to go the route of breaking the config api, I would suggest that we add ControllerIP as a field of the Subnet struct. For now valid() could enforce that only one Subnet.ControllerIP is set, and later when mutli-controller support lands we can remove that restriction.

\cc @aaronlevy

mumoshu · 2016-05-04T03:47:25Z

@colhom Thanks for your comments!
First of all, I don't want to make a breaking change as far as possible, too.

Excuse me if i'm missing the point. I've tried to make this not to be a breaking change by making subnets settings to be completely optional (see here and there).

What I intended is that, even after this change, instanceCIDR is still supported so that if you specify instanceCIDR in the top-level in cluster.yaml, you "can" reference it from templates if you like.

mumoshu · 2016-05-04T03:47:35Z

@colhom Let me add something to my previous comment.

Maybe not about config api, but I did make the singleton-controller-part of the template refer to not the top-level instanceCIDR but the one in the first subnet here believing it's clearer(just thought that referring to the top-level instanceCIDR when you have 2-subnets won't make sense to users).

I guess I can make singleton-controller-part of the template refer to the top-level instanceCIDR (also with the according changes to config.go).
Then, I guess that both config and template are backward-compatible.
Would it make sense to you?

🙇

….yaml` ref coreos#439 (comment)

colhom · 2016-05-05T23:19:16Z

multi-node/aws/pkg/config/config.go

@@ -74,6 +75,15 @@ func ClusterFromBytes(data []byte) (*Cluster, error) {
 	// as it will with RecordSets
 	c.HostedZone = WithTrailingDot(c.HostedZone)

+	if len(c.Subnets) == 0 {
+		c.Subnets = []Subnet{


definitely appreciate the effort to keep backwards compatibility here. Given that this is clearly documented, we can later think about deprecating instanceCIDR and availabilityZone.

colhom · 2016-05-05T23:29:53Z

@mumoshu I agree with your approach to keep backwards comparability. I'd like to see a few more things.

If len(config.Subnets) > 0, then ensure in config.Valid() that instanceCIDR and availbilityZone are empty. Otherwise error out.
Add the "this is only for single-AZ mode" warning to availabilityZone in templates/cluster.yaml as well
To figure out which subnet the controller goes in, loop through all the subnets and pick the the one with an instanceCIDR which contains the controllerIP. Assuming all subnets are non-overlapping, the answer should either be a single subnet or an error (no subnets contain controllerIP)

colhom · 2016-05-05T23:30:34Z

multi-node/aws/pkg/config/config.go

-		)
+
+	for i, subnet := range cfg.Subnets {
+		if subnet.AvailabilityZone == "" {


We should also be checking that non of the subnets overlap with eachother.

ref coreos#439 (comment)

…n did you want, single AZ or multi AZ? ref coreos#439 (comment)

…subnets to automatically find and chooose an appropriate subnet for the specified controllerIP ref 3rd comment in coreos#439 (comment)

… in templates/cluster.yaml ref 2nd comment in coreos#439 (comment)

mumoshu · 2016-05-06T17:05:39Z

@colhom Thanks! I have added some commits according to your very agreeable comments.
I appreciate it if you could look into those too.

colhom · 2016-05-06T21:26:10Z

multi-node/aws/pkg/config/config.go

-			cfg.ControllerIP,
-		)
+
+	if len(cfg.Subnets) == 0 {


Isn't this case unreachable? here you ensure that subnets always has at least one element.

Oops sorry, scratch that ;)

Question: If we move this block of logic before c.valid(), can't we do away with this conditional validation behavior and only validate the cfg.Subnets array, as the backwards compatibility measure would have already been taken?

@colhom Thanks for your question!

I guess doing that(overwriting InstanceCIDR with a default value and determining implicit Subnets from the top-level InstanceCIDR and AvailabilityZone) before validation prevents us doing the validation: Did the user specified both top-level InstanceCIDR/AvailabilityZone and Subnets? defined here with the test

It's a bit confusing but, thinking more generally, I took c.valid() as a validation of an user input. To structurally validate the user input, I thought it would be better not to structurally touch it before validation.

Does this make sense to you?

Btw, the order of valid() call and backward-compatiblity logic, defaults are not very intuitive in my code and I wish I could somehow make it more intuitive, too 👍
Just not coming up with a nicer idea though 😢
(I guess we can do better if we made the user-input and the config to be separate go structs. For me, it seemed too much for this PR though)

colhom · 2016-05-06T21:45:04Z

@mumoshu thanks for the changes, they look good.

I have a few minor comments I've left. I feel like the code could be a bit cleaner if instanceCIDR/availablilityZone were canonicalized into the subnets array before cluster.valid() is called in the ClusterFromBytes() method. Let me know what you think about this?

colhom · 2016-05-06T21:47:37Z

multi-node/aws/pkg/config/config.go

+		if err != nil {
+			return nil, fmt.Errorf("invalid instanceCIDR: %v", err)
+		}
+		if instanceCIDR.Contains(controllerIPAddr) {


I see that my prior comment about "still checking controller is in first subnet" has already been fixed! Sorry, I blame github for being weird ;)

Nice catch 👍 I've realized it while testing and force-pushed a squashed commit before your review 😅

mumoshu · 2016-05-08T00:28:28Z

@colhom Thanks for your review! I have replied to you in #439 (comment)

cgag · 2016-05-10T19:21:03Z

multi-node/aws/pkg/config/config.go

+			controllerSubnetFound = true
+		} else if !controllerSubnetFound && i == lastSubnetIndex {
+			return nil, fmt.Errorf("Fail-fast occurred possibly because of a bug: ControllerSubnetIndex couldn't be determined for subnets (%v) and controllerIP (%v)", stackConfig.Subnets, stackConfig.ControllerIP)
+		}


If I understand this loop correctly, I think it could be made a bit more simple by dropiing lastSubnetIndex and moving this final check
outside of the loop:

controllerSubnetFound := false for i, subnet := range stackConfig.Subnets { _, instanceCIDR, err := net.ParseCIDR(subnet.InstanceCIDR) if err != nil { return nil, fmt.Errorf("invalid instanceCIDR: %v", err) } if instanceCIDR.Contains(controllerIPAddr) { stackConfig.ControllerSubnetIndex = i controllerSubnetFound = true } } if !controllerSubnetFound { return nil, fmt.Errorf("Fail-fast occurred possibly because of a bug: ControllerSubnetIndex couldn't be determined for subnets (%v) and controllerIP (%v)", stackConfig.Subnets, stackConfig.ControllerIP) }

Thanks! Addressed in ea9f366

ref coreos#439 (comment)

colhom · 2016-05-17T23:28:24Z

@mumoshu sorry for the delay- this looks good. Rebase and squash the commits, and I'll run it through tests.

One step forward to achieve high-availability throught the cluster. This change allows you to specify multiple subnets in cluster.yaml to make workers' ASG spread instances over those subnets. Differentiating each subnet's availability zone results in H/A of workers. Beware that this change itself does nothing with H/A of masters. Possibly relates to coreos#147, coreos#100

mumoshu · 2016-05-18T00:02:41Z

@colhom Thanks, I have just rebased/squashed this 👍

colhom · 2016-05-18T21:58:59Z

Thanks @mumoshu !

mumoshu force-pushed the multi-az-workers branch from 21ba2bd to b6f5bba Compare April 27, 2016 08:02

mumoshu mentioned this pull request Apr 28, 2016

Support multi availability zone deployments on AWS #100

Closed

mumoshu mentioned this pull request May 3, 2016

Production Quality Deployment #340

Closed

18 tasks

aaronlevy reviewed May 3, 2016
View reviewed changes

aaronlevy added platform/AWS kind/enhancement labels May 3, 2016

colhom reviewed May 4, 2016
View reviewed changes

mumoshu added a commit to mumoshu/coreos-kubernetes that referenced this pull request May 4, 2016

Add documentation about subnets field and its subfields in `cluster…

7b58131

….yaml` ref coreos#439 (comment)

colhom reviewed May 5, 2016
View reviewed changes

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016

Ensure that non of the subnets overlap with each other

b9d033d

ref coreos#439 (comment)

mumoshu force-pushed the multi-az-workers branch from 8205845 to d5e7161 Compare May 6, 2016 16:59

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016

Error out when there is ambiguity in cluster.yaml. Which configuratio…

4b71875

…n did you want, single AZ or multi AZ? ref coreos#439 (comment)

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016

Add the "this is only for single-AZ mode" warning to availabilityZone…

6c5a02a

… in templates/cluster.yaml ref 2nd comment in coreos#439 (comment)

colhom reviewed May 6, 2016
View reviewed changes

cgag reviewed May 10, 2016
View reviewed changes

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 13, 2016

We have to reference .ControllerSubnetIndex here

78e067f

ref coreos#439 (comment)

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 13, 2016

Make code mode concise

ea9f366

ref coreos#439 (comment)

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 13, 2016

Remove unnecessary code

f9c4f4c

ref coreos#439 (comment)

mumoshu force-pushed the multi-az-workers branch from f9c4f4c to 94a72a0 Compare May 18, 2016 00:01

colhom merged commit 4ee74bd into coreos:master May 18, 2016

harsha-y mentioned this pull request May 19, 2016

AWS VPC NAT Gateway? #212

Closed

colhom mentioned this pull request May 31, 2016

Deprecate instanceCIDR/availabilityZone parameters #520

Closed

pieterlange mentioned this pull request Oct 29, 2016

Production Quality Deployment kubernetes-retired/kube-aws#9

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-aws: Support Multi-AZ workers on AWS #439

kube-aws: Support Multi-AZ workers on AWS #439

mumoshu commented Apr 27, 2016 •

edited

Loading

rafamonteiro commented Apr 28, 2016

aaronlevy May 3, 2016 •

edited

Loading

mumoshu May 3, 2016 •

edited

Loading

aaronlevy May 3, 2016 •

edited

Loading

mumoshu May 4, 2016

aaronlevy commented May 3, 2016

colhom commented May 3, 2016 •

edited

Loading

colhom May 4, 2016

mumoshu May 4, 2016

colhom commented May 4, 2016

mumoshu commented May 4, 2016 •

edited

Loading

mumoshu commented May 4, 2016 •

edited

Loading

colhom May 5, 2016

colhom commented May 5, 2016

colhom May 5, 2016

mumoshu commented May 6, 2016

colhom May 6, 2016

colhom May 6, 2016

colhom May 6, 2016

mumoshu May 8, 2016 •

edited

Loading

colhom commented May 6, 2016

colhom May 6, 2016

mumoshu May 8, 2016 •

edited

Loading

mumoshu commented May 8, 2016

cgag May 10, 2016 •

edited

Loading

mumoshu May 13, 2016

colhom commented May 17, 2016

mumoshu commented May 18, 2016

colhom commented May 18, 2016

kube-aws: Support Multi-AZ workers on AWS #439

kube-aws: Support Multi-AZ workers on AWS #439

Conversation

mumoshu commented Apr 27, 2016 • edited Loading

How I have tested this

rafamonteiro commented Apr 28, 2016

aaronlevy May 3, 2016 • edited Loading

Choose a reason for hiding this comment

mumoshu May 3, 2016 • edited Loading

Choose a reason for hiding this comment

aaronlevy May 3, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlevy commented May 3, 2016

colhom commented May 3, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colhom commented May 4, 2016

mumoshu commented May 4, 2016 • edited Loading

mumoshu commented May 4, 2016 • edited Loading

Choose a reason for hiding this comment

colhom commented May 5, 2016

Choose a reason for hiding this comment

mumoshu commented May 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu May 8, 2016 • edited Loading

Choose a reason for hiding this comment

colhom commented May 6, 2016

Choose a reason for hiding this comment

mumoshu May 8, 2016 • edited Loading

Choose a reason for hiding this comment

mumoshu commented May 8, 2016

cgag May 10, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colhom commented May 17, 2016

mumoshu commented May 18, 2016

colhom commented May 18, 2016

mumoshu commented Apr 27, 2016 •

edited

Loading

aaronlevy May 3, 2016 •

edited

Loading

mumoshu May 3, 2016 •

edited

Loading

aaronlevy May 3, 2016 •

edited

Loading

colhom commented May 3, 2016 •

edited

Loading

mumoshu commented May 4, 2016 •

edited

Loading

mumoshu commented May 4, 2016 •

edited

Loading

mumoshu May 8, 2016 •

edited

Loading

mumoshu May 8, 2016 •

edited

Loading

cgag May 10, 2016 •

edited

Loading