Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

kube-aws: Support Multi-AZ workers on AWS #439

Merged
merged 1 commit into from
May 18, 2016

Conversation

mumoshu
Copy link
Contributor

@mumoshu mumoshu commented Apr 27, 2016

One step forward to achieve high-availability throughout the cluster.
This change allows you to specify multiple subnets in cluster.yaml to make workers' ASG spread instances over those subnets. Differentiating each subnet's availability zone results in H/A of workers.
Beware that this change itself does nothing with H/A of masters.

Possibly relates to #147, #100

How I have tested this

I have tested this running ./build && rm -rf cluster.yaml userdata/ credentials/ && bin/kube-aws init *several options* && bin/kube-aws render && bin/kube-aws up with with/without the newly added subnets section in cluster.yaml.

For example, given a cluster.yaml:

*snip*

subnets:
- instanceCIDR: "10.0.0.0/24"
  availabilityZone: ap-northeast-1a
- instanceCIDR: "10.0.1.0/24"
  availabilityZone: ap-northeast-1c

After modifying worker ASG's desired capacity to 2 result in 2 worker nodes:

$ kubectl --kubeconfig=kubeconfig get nodes
NAME                                            STATUS                     AGE
ip-10-0-0-125.ap-northeast-1.compute.internal   Ready                      10m
ip-10-0-0-50.ap-northeast-1.compute.internal    Ready,SchedulingDisabled   17m
ip-10-0-1-51.ap-northeast-1.compute.internal    Ready                      17m

spreading across availability zone(this time they are ap-northeast-1a and ap-northeast-1c)

image

@rafamonteiro
Copy link

Just gave it a try and worked perfectly 👍

@mumoshu mumoshu mentioned this pull request May 3, 2016
18 tasks
i,
)
}
if i == 0 && !instancesNet.Contains(controllerIPAddr) {
Copy link
Contributor

@aaronlevy aaronlevy May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the convention of first subnet must include controllerIP, we should document it (maybe in config).

Copy link
Contributor Author

@mumoshu mumoshu May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment!
I agree that we are better documenting it for now.

Also let me suggest that we should remove it when we introduce an auto-scaling group to manage controller(s) and an ELB pointing to it.

This "convention" remains here because we currently have a singleton controller(which has controllerIP pointing to it) and it is placed in the first subnet "for now"(choosing one of subnets for the singleton controller doesn't make sense but I have proceeded with this convention for now).

If we had an auto-scaling group managing controller(s), we would configure the group to spread EC2 instances evenly over multiple subnets(as I had done for workers in this PR).
Then, we don't need to choose first/second/last subnet to place the singleton controller(because AWS auto-scaling does the job for us) or to specify/validate controllerIP(because having multiple controllers, we will point the master from workers via DNS name of an load balancer or Route53 DNS name pointing to it).

Does this make sense to you? @aaronlevy @colhom

Copy link
Contributor

@aaronlevy aaronlevy May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, longer-term plan sounds good to me.

My concern with documenting this in the config.yaml is that it's not immediately clear what is required of the networking configs (e.g. first subnet must contain controllerIP).

However, I just checked the existing config.yaml, and looks like we're not being explicit about the any of the networking expectations - so this is probably something I can raise as a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same feeling about your concern .
Also, at this point, a separate issue sounds O.K. for me to keep us proceeding incrementally.

@aaronlevy
Copy link
Contributor

In general the code looks good. I'm not familiar enough with the upstream "ubernetes-lite" proposal and how well this would work with that (deferring to @colhom )

@colhom
Copy link
Contributor

colhom commented May 3, 2016

@mumoshu I've read through the kube-up segment that deals with this, and AFAICT the process is the same there as you have it here. I'm going to kick off an conformance test tonight for this.

federation-lite proposal

@@ -108,6 +118,12 @@ type Cluster struct {
RecordSetTTL int `yaml:"recordSetTTL"`
HostedZone string `yaml:"hostedZone"`
StackTags map[string]string `yaml:"stackTags"`
Subnets []Subnet `yaml:"subnets"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field (and it's subfields) need to be added to pkg/config/templates/cluster.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added documentation in the commit 7b58131
Would you mind reviewing it? 🙇

@colhom
Copy link
Contributor

colhom commented May 4, 2016

@mumoshu this is a breaking change, as instanceCIDR is no longer supported. That information is now expressed as the first element of the subnets array. I like this format more, but I hesitate to merge breaking changes.

If we were to go the route of breaking the config api, I would suggest that we add ControllerIP as a field of the Subnet struct. For now valid() could enforce that only one Subnet.ControllerIP is set, and later when mutli-controller support lands we can remove that restriction.

\cc @aaronlevy

@mumoshu
Copy link
Contributor Author

mumoshu commented May 4, 2016

@colhom Thanks for your comments!
First of all, I don't want to make a breaking change as far as possible, too.

Excuse me if i'm missing the point. I've tried to make this not to be a breaking change by making subnets settings to be completely optional (see here and there).

What I intended is that, even after this change, instanceCIDR is still supported so that if you specify instanceCIDR in the top-level in cluster.yaml, you "can" reference it from templates if you like.

@mumoshu
Copy link
Contributor Author

mumoshu commented May 4, 2016

@colhom Let me add something to my previous comment.

Maybe not about config api, but I did make the singleton-controller-part of the template refer to not the top-level instanceCIDR but the one in the first subnet here believing it's clearer(just thought that referring to the top-level instanceCIDR when you have 2-subnets won't make sense to users).

I guess I can make singleton-controller-part of the template refer to the top-level instanceCIDR (also with the according changes to config.go).
Then, I guess that both config and template are backward-compatible.
Would it make sense to you?

🙇

mumoshu added a commit to mumoshu/coreos-kubernetes that referenced this pull request May 4, 2016
@@ -74,6 +75,15 @@ func ClusterFromBytes(data []byte) (*Cluster, error) {
// as it will with RecordSets
c.HostedZone = WithTrailingDot(c.HostedZone)

if len(c.Subnets) == 0 {
c.Subnets = []Subnet{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely appreciate the effort to keep backwards compatibility here. Given that this is clearly documented, we can later think about deprecating instanceCIDR and availabilityZone.

@colhom
Copy link
Contributor

colhom commented May 5, 2016

@mumoshu I agree with your approach to keep backwards comparability. I'd like to see a few more things.

  • If len(config.Subnets) > 0, then ensure in config.Valid() that instanceCIDR and availbilityZone are empty. Otherwise error out.
  • Add the "this is only for single-AZ mode" warning to availabilityZone in templates/cluster.yaml as well
  • To figure out which subnet the controller goes in, loop through all the subnets and pick the the one with an instanceCIDR which contains the controllerIP. Assuming all subnets are non-overlapping, the answer should either be a single subnet or an error (no subnets contain controllerIP)

)

for i, subnet := range cfg.Subnets {
if subnet.AvailabilityZone == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be checking that non of the subnets overlap with eachother.

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016
@mumoshu mumoshu force-pushed the multi-az-workers branch from 8205845 to d5e7161 Compare May 6, 2016 16:59
mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016
mumoshu added a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016
…subnets to automatically find and chooose an appropriate subnet for the specified controllerIP

ref 3rd comment in coreos#439 (comment)
mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 6, 2016
@mumoshu
Copy link
Contributor Author

mumoshu commented May 6, 2016

@colhom Thanks! I have added some commits according to your very agreeable comments.
I appreciate it if you could look into those too.

cfg.ControllerIP,
)

if len(cfg.Subnets) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this case unreachable? here you ensure that subnets always has at least one element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry, scratch that ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: If we move this block of logic before c.valid(), can't we do away with this conditional validation behavior and only validate the cfg.Subnets array, as the backwards compatibility measure would have already been taken?

Copy link
Contributor Author

@mumoshu mumoshu May 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colhom Thanks for your question!

I guess doing that(overwriting InstanceCIDR with a default value and determining implicit Subnets from the top-level InstanceCIDR and AvailabilityZone) before validation prevents us doing the validation: Did the user specified both top-level InstanceCIDR/AvailabilityZone and Subnets? defined here with the test

It's a bit confusing but, thinking more generally, I took c.valid() as a validation of an user input. To structurally validate the user input, I thought it would be better not to structurally touch it before validation.

Does this make sense to you?

Btw, the order of valid() call and backward-compatiblity logic, defaults are not very intuitive in my code and I wish I could somehow make it more intuitive, too 👍
Just not coming up with a nicer idea though 😢
(I guess we can do better if we made the user-input and the config to be separate go structs. For me, it seemed too much for this PR though)

@colhom
Copy link
Contributor

colhom commented May 6, 2016

@mumoshu thanks for the changes, they look good.

I have a few minor comments I've left. I feel like the code could be a bit cleaner if instanceCIDR/availablilityZone were canonicalized into the subnets array before cluster.valid() is called in the ClusterFromBytes() method. Let me know what you think about this?

if err != nil {
return nil, fmt.Errorf("invalid instanceCIDR: %v", err)
}
if instanceCIDR.Contains(controllerIPAddr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that my prior comment about "still checking controller is in first subnet" has already been fixed! Sorry, I blame github for being weird ;)

Copy link
Contributor Author

@mumoshu mumoshu May 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👍 I've realized it while testing and force-pushed a squashed commit before your review 😅

@mumoshu
Copy link
Contributor Author

mumoshu commented May 8, 2016

@colhom Thanks for your review! I have replied to you in #439 (comment)

controllerSubnetFound = true
} else if !controllerSubnetFound && i == lastSubnetIndex {
return nil, fmt.Errorf("Fail-fast occurred possibly because of a bug: ControllerSubnetIndex couldn't be determined for subnets (%v) and controllerIP (%v)", stackConfig.Subnets, stackConfig.ControllerIP)
}
Copy link
Contributor

@cgag cgag May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this loop correctly, I think it could be made a bit more simple by dropiing lastSubnetIndex and moving this final check
outside of the loop:

    controllerSubnetFound := false
    for i, subnet := range stackConfig.Subnets {
        _, instanceCIDR, err := net.ParseCIDR(subnet.InstanceCIDR)
        if err != nil {
            return nil, fmt.Errorf("invalid instanceCIDR: %v", err)
        }
        if instanceCIDR.Contains(controllerIPAddr) {
            stackConfig.ControllerSubnetIndex = i
            controllerSubnetFound = true
        }
    }
    if !controllerSubnetFound {
        return nil, fmt.Errorf("Fail-fast occurred possibly because of a bug: ControllerSubnetIndex couldn't be determined for subnets (%v) and controllerIP (%v)", stackConfig.Subnets, stackConfig.ControllerIP)
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Addressed in ea9f366

mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 13, 2016
mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 13, 2016
mumoshu pushed a commit to mumoshu/coreos-kubernetes that referenced this pull request May 13, 2016
@colhom
Copy link
Contributor

colhom commented May 17, 2016

@mumoshu sorry for the delay- this looks good. Rebase and squash the commits, and I'll run it through tests.

One step forward to achieve high-availability throught the cluster.
This change allows you to specify multiple subnets in cluster.yaml to make workers' ASG spread instances over those subnets. Differentiating each subnet's availability zone results in H/A of workers.
Beware that this change itself does nothing with H/A of masters.

Possibly relates to coreos#147, coreos#100
@mumoshu mumoshu force-pushed the multi-az-workers branch from f9c4f4c to 94a72a0 Compare May 18, 2016 00:01
@mumoshu
Copy link
Contributor Author

mumoshu commented May 18, 2016

@colhom Thanks, I have just rebased/squashed this 👍

@colhom
Copy link
Contributor

colhom commented May 18, 2016

Thanks @mumoshu !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants