-
Notifications
You must be signed in to change notification settings - Fork 295
Production Quality Deployment #9
Comments
Initial comments on this: The etcd setup still needs some love. I have a monkeypatch at https://github.com/pieterlange/kube-aws/tree/feature/external-etcd for pointing the cluster to an external etcd cluster (i'm using https://crewjam.com/etcd-aws/). This is not a clean solution. I think we need to:
Some work is being done to have an entirely self-hosted kubernetes cluster (with etcd running as petset in kubernetes itself?) but from an ops PoV this feels like way too many moving parts at the moment. As for elasticsearch/heapster i tend to move in the exact opposite direction: i'd rather host elasticsearch inside of the cluster. I'm also not sure if this should be part of a default installation. |
@pieterlange Tried to integrate your work on "coreos/coreos-kubernetes#629" with "coreos/coreos-kubernetes#608" and https://github.com/crewjam/etcd-aws , a couple of weeks ago. Current setup:
Without TLS it works great, the cluster recovers fine. With TLS, ETCD works fine but doesn't recover and also doesn't remove terminated nodes. Still need to fix these in Apart from this, the DNS record for ETCD internal ELB is still a manual process at the moment (I set an alias record after the ELB is created), but this can be quickly fixed after. If anyone is interested working on this, maybe can pick some changes I already made on https://github.com/camilb/etcd-aws/tree/ssl https://github.com/camilb/coreos-kubernetes/tree/etcd-asg |
This is great! Thanks @camilb, this will definitely save time if the project goes in that direction. For reference, there's some notes on self-hosted etcd in the self-hosted design docs. Maybe @aaronlevy can chip in if external-etcd is a good deployment strategy. |
I've posted my thoughts on why I might want to have the "Dedicated controller subnets and routetables" thing at #35 (comment) |
where is this referenced? Where to read about? |
This was not properly linked @aholbreich, but it's in coreos/coreos-kubernetes#346 Deploying to existing subnets was skipped, but if you think you need this please add usecases to #52. |
Just curious but does everyone want auto-scaling of workers based on cluster-autoscaler to be in the list? Currently, cluster-autoscaler wouldn't work as might be expected in kube-aws created clusters.
Actually, that's why I've originally started the work for #46. |
Regarding heapster/elasticsearch: |
I believe this is already supported in kube-aws as of today |
This is supported since v0.9.4 |
This is WIP in #414 |
@AlmogBaku Thanks for the information! |
Btw I'm using GCP Stackdriver Logging for aggregating log messages from my production kube-aws clusters. When there're much nicer alternatives like Stackdriver, do we really need to support ES out-of-box in kube-aws? |
@pieterlange Is the above sentence meant for rolling-updates of worker/controller/etcd nodes? |
I don't think we need to support ES in kube-aws, but we could have some recommendations.
I think this referred to removing the nodes from the cluster state, where still required (eg etcd member lists). Removing/draining kubelets is already supported 👍 . |
I think we should take the same approach taken by kube-adm, which is to automatically approve all requests sent via a specific bootstrap token, making sure this token can only be used for CSR via RBAC (already supported by kube-aws). |
I have a working solution for ingesting cluster-wide logs to Sumologic. When I have some time, I could add this to kube-aws as an experimental feature. The same could be done for GCP Stackdriver Logging, and other vendors. |
One potential problem with logging is the number of solutions out there. I can recommend fluentd-kubernetes-daemonset and will likely be helping add GCP Stackdriver support to that soon. However, I know some have strong opinions on using other logging tools/frameworks. If might be good to provide some recommendations in the docs. |
… to hcom-flavour * commit '28b893f91b55ad07545bcf7c871bccad7be1bbd9': @noissue Bump kubbe/coros version
I believe these two in the description are now resolved thanks to @danielfm - a node now pend the rolling-update of an ASG while the node drainer drains pods. |
@pieterlange I'm willing to take |
@amitkumarj441 i'm not very active in kube-aws anymore and will close the issue as most of the items have been fixed nowadays. I am personally running my elasticsearch clusters inside of kubernetes and i also think thats the best way to go forward, but knock yourself out ;-). |
Thanks @pieterlange for letting me know about this. |
Copy of the old issue with a lot of boxes ticked thanks to contributions by @colhom @mumoshu @cgag and many others. Old list follows, will update where necessary.
The goal is to offer a "production ready solution" for provisioning a coreos kubernetes cluster on AWS. These are the major functionality blockers that have been thought of:
The text was updated successfully, but these errors were encountered: