Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

[WIP] Ha control plane #596

Closed
wants to merge 2 commits into from
Closed

Conversation

colhom
Copy link
Contributor

@colhom colhom commented Aug 2, 2016

Controller(s) are in a crosszone autoscaling group and behind an ELB. ControllerIP parameter is gone, as we're going to rely on DNS for all of this.

Depends on #544

This was referenced Aug 2, 2016
@colhom
Copy link
Contributor Author

colhom commented Aug 2, 2016

I'm struggling with what to do about createRecordSet option for controlling whether a Route53 DNS record is created for the API server endpoint.

The nodes now rely on externalDNSName to talk to the API server ELB, so tldr; the nodes can't join the cluster until externalDNSName is CNAME'd to point at the apiserver ELB.

In the case createRecordSet=true, this all happens automagically and there's nothing to notice.

If createRecordSet=false, this could be kind of weird from the operators perspective, in that the nodes will appear as ready "some amount of time" after the DNS entry is manually created.

I'm debating between:

  • deprecate the createRecordSet option and make the Route53 integration mandatory.
  • make the Route53 integration default to yes and print out a warning if it's turned off (like traction control).

\cc @mumoshu @pieterlange @whereisaaron

@colhom colhom force-pushed the ha-control-plane branch from 67412f9 to f552bc5 Compare August 2, 2016 01:34
@mumoshu
Copy link
Contributor

mumoshu commented Aug 2, 2016

@colhom Personally, I'm happy with deprecating createRecordSet.

However, I guess there's a fairly common situation that the hosted zone is managed in an AWS account other than the one kube-aws is launching a cfn stack in.

I believe CloudFormation doesn't allow creating record sets under a hosted zone managed in another account (See e.g. https://forums.aws.amazon.com/thread.jspa?messageID=537944)

So, IMHO, your latter option(just defaulting the Route53 integration to true and warn when false) would be better as it provides both

  • a nice default behavior to most of users and
  • a work-around(manually creating a record set under the hosted zone in another AWS account) to users who encountered the limitation

Btw, I'm very excited to see this PR! I'll definitely going to test this out.

@pieterlange
Copy link

Awesome work @colhom! Haven't got time to review it right now, but will try to test this asap.

Agreed with @mumoshu's evaluation of your internal debate (don't debate, let users figure out things for themselves but provide sane defaults)

@colhom colhom force-pushed the ha-control-plane branch 2 times, most recently from c052b43 to bce3c11 Compare August 2, 2016 20:25
@colhom
Copy link
Contributor Author

colhom commented Aug 2, 2016

Alrighty, as of bce3c11 I have successfully created a very large HA cluster in us-east-1{a,c,d} with controllers,workers and etcd instances distributed across the three zones.

Kube-aws can now officially max out the aws regional SLA!

If you want to see which AZs your account can deploy to in a given region (sometimes it's less than 3 👎 ):

aws --region=$REGION ec2 describe-availability-zones

and use any where State=Available. AZ availability in a region is different from account to account, yay resource abstraction.

As of now, controllers, workers and etcd instances are all scheduled as close to evenly as possible across the AZs. Naturally an even multiple of your AZ count makes sense for workers and controllers.

There's really very little reason to double or triple up etcd instances per AZ, so I'd recommend etcd count === AZ count.

@colhom colhom force-pushed the ha-control-plane branch from d412d61 to 65bd4fe Compare August 8, 2016 23:39
@colhom colhom mentioned this pull request Aug 9, 2016
@colhom
Copy link
Contributor Author

colhom commented Oct 28, 2016

Hello Kubernetes Community,

Future work on kube-aws will be moved to a new dedicated repository. @mumoshu will be running point on maintaining that repository- please move all issues and PRs over there as soon as you can. We will be halting active development on the AWS portion of this repository in the near future. We will continue to maintain the vagrant single and multi-node distributions in this repository, along with our hyperkube container image image.

A community announcement to end users will be made once the transition is complete. We at CoreOS ask that those reading this message avoid publicizing/blogging about the transition until the official annoucement has been made to the community in the next week.

The new dedicated kube-aws repository already has the following features merged in:

  • Discrete etcd cluster
  • HA control plane
  • Cluster upgrades
  • Node draining/cordoning

If anyone in the Kubernetes community would like to be involved with maintaining this new repository, find @chom and/or @mumoshu on the Kubernetes slack in the #sig-aws channel or via direct message.

~CoreOS Infra Team

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants