-
Notifications
You must be signed in to change notification settings - Fork 306
Offer minimal deployment model that supports development and experimentation of Tanzu #2266
Comments
❤️ this! +1 to |
This would be great. I think the issie of host restarts will remain in multi node local clusters but could be solved this way with single node clusters. |
Great point. There are ways to translate some of these things to kind (or other providers brought in by the However, if it did, we could parse these kubeadm customization(s) locally and ensure their behaviors propagate into the underlying provider. |
tanzu standalone-cluster
model with tanzu local
modeltanzu standalone-cluster
model with tanzu local
model
Here are my comments:
Cons:
Things that would be important for developers (or that type of non knowledgeable users):
Overall I like this approach, even if it removes some of the benefits "standalone" cluster had given how hard implementing local controller via kcp seemed, although I would have preferred that route to align more with ClusterAPI and TKG/Tanzu. |
I'm having a slow day - what are the benefits over creating a management cluster via CAPD and scheduling workloads on it? |
I have seen some:
I'm sure there are more. CAPD is not a proper infrastructure provider in ClusterAPI as it was designed for a single purpose of unit tests. Code would need to be modified to be a proper ClusterAPI infrastructure provider. There's a lot of technical debt that make things harder. I guess that's the biggest hidden benefit. |
Exactly what @jorgemoralespou said, plus:
|
I'm pretty sure the lack of standalone clusters or the equivalent, especially given the no-reboot limitation, on other providers is not a problem. Do we have a time and resource comparison? I understand the concepts but I'm wondering 10x what :) |
3 minutes (a local cluster) versus 30 minutes (a standalone cluster) in my machine |
I personally don't like Kind and the fact that this proposal misaligns from ClusterAPI, but given the huge different in experience, I have never wanted to use a CAPD standalone cluster but I will definitely use local clusters. |
Don't take this data as scientific, but here's what I got on a very old 2 core linux box (running a bunch of random stuff)
|
Also by using kind which has support for things other than docker such as podman it opens capabilities to do such an integration into local clusters if the whole docker desktop licensing thing becomes an issue and people move away from it. Linux isnt an issue in that regard but mac and windows users which i believe would be the vast majority of use cases for TCE local clusters could be benefited by supporting a different container runtime for running the cluster itself. |
I agree. This also speaks to why it's important we get the provider interface right. Beyond kind, we could support a variety of underlying models, as long as post-cluster create, we can get passed back an admin |
Strong agree with this. If I were a workload developer, I'm less concerned with simulating cluster lifecycle, and testing high availability aspects of the workload are more likely to depend on the attributes of the particular cloud I'm deploying to (AZs, storage etc...), which CAPD isn't a good enough approximation of to be useful. |
@jorgemoralespou Can you say more about this? What differences specifically?
My understanding is the difference between a management cluster and a workload cluster is 3-5 controllers running in the management cluster for the CAPI information. Other than that, I thought they were identical. Coming from my kubernetes app development background, this would have been very helpful for testing locally against a Kubernetes API server. It would have been less useful for certain constructs (backing up volume data with Velero), but as mentioned above, that was often cloud platform dependent in any case. |
Management cluster does also have a couple of controllers that install packages on the workload-cluster (addon-manager) and (capabilities-manager) if I'm not mistaken. That's one of the reasons why standalone clusters and workload clusters have differences to upgrade kapp-controller as an example. |
Have we thought about creating a Kind provider for Cluster API rather than trying to replicate the lifecycle model? |
Where do you feel this proposal is replicating the lifecycle model? The proposal's intent was to say that, we don't need a lifecycle model. We just need to bootstrap a cluster on a single node. For those reading this proposal, we're largely advocating to stay out of the cluster lifecycle problem domain. On a technical level, our implementation/proposal calls an API equivalent to cluster create. What that API invokes under the hood can be anything. For example it can:
Once the thing managing the lifecycle finishes bootstrapping the cluster, we receive a kubeconfig back. Then, that's when this plugin really steps in to do its work. And it makes its decisions on what to do on the cluster based on the declaration of the distribution (which exists in the TKR). Hope this helps, but please let me know if there's overlap I'm not seeing. |
Thanks for the added context @joshrosso, the above makes. When I started reading through the issue from the problem statement it seems that the issues were mostly around speed of standalone cluster execution, have we explored avenues that would help both tkg-lib and cluster-api to create a CAPD (or similar) based cluster in the minimum amount of time possible? What's the general role of a standalone/local cluster for our users? From our docs:
Is a kind cluster enough for all use cases? What are the implications of not having an active management-workload cluster in this case? If we don't need a lifecycle model, would we never need having access to Cluster API primitives like Cluster, ClusterClass soon, or MachineDeployments? |
The way I see this, is that this is mostly about the local application development workflow for your average business unit appdev. Having the ability to provision a local kind/minikube whatever cluster locally as fast as possible and deploying some Tanzu addons to give it a Tanzu look and feel. In these instances, we're not really concerned around a full lifecycle model I think. I think however, maintaining clusterctl save/restore and some of the use cases from the existing standalone cluster where CAPI does the provisioning is still useful for everything which isn't "i need a tanzu flavoured k8s on my laptop right now and don't eat all my RAM" |
I like the idea. I think we should update the docs to outline the user stories of when you would use (A) vs. (B). There may be some gotchas around conf parameters, but as we tinker we'll know more. e.g. Will pinniped, contour, etc. just work? |
This all makes sense, thanks folks — it's definitely good to have more context, appreciate all the responses |
I can see the need to get Contour working in local clusters, but I don't think Pinniped is going to be that useful since the persona this is intended for isn't going to have permissions or the desire to hook up their local dev cluster to an IdP. |
A few updates:
|
Minor feedback/thought on the terminal experience of the proof-of-concept given it is introducing newish output patterns: The secondary text could get difficult to read depending on the minimum contrast config of the terminal and users' color scheme. For example, on Solarize, the text is barely visible. Related: #2730 where we're starting to think about improving visibility of processes. |
The color formatting looks great... when we are running in a terminal theme that fits well with it. But it's hard to guarantee that, so I think we should either see if we can find some way to query the terminal to get color recommendations based on the theme, or we should just go with the default color and just use indentation to make it easier to read. There may also be some color-blind concerns with the way we are doing it now as well. |
^this. The indentation is adequate. |
A few thoughts on the color / contrast problem:
|
How would the docs look like for unmanaged-cluster in the context of existing documentation? Haven't really seen anything for docs in the proposal? |
I'm looking at this today, and tracking here: #2808 |
standalone-cluster
plugin
The links to proposed model are pointing to an empty README. For posterity, here's a link to the original README contents: https://github.com/vmware-tanzu/community-edition/blob/db06202fdd79271e4b5e80a0aa76387ca78917f0/cli/cmd/plugin/standalone-cluster/README.md |
Asks
We aim to
accept
orreject
this proposal 60 days after opening (12/17/2021
).Proposal
local
,standalone
, andunmanaged-cluster
in various places. We have decided to move forward with the nameunmanaged-cluster
. Please note that references tolocal
orstandalone
(notstandalone-cluster
) representunmanaged-cluster
🚨 This proposal has been partially implemented to help further the conversation around whether we should accept it in this project. Read here for details on how to try it and design details. 🚨
Standalone clusters (SAC) are our attempt to provide workload clusters without the need of a long-running management cluster. With this, we intended to:
To accomplish this, we re-purposed Cluster API and extended TKG-Lib (via tanzu-framework) to create the standalone cluster model. With this model in use for many months we've learned that:
1
(above) and a stunted experience for2
(above).We believe that solving
1
is high-value for those using Tanzu. We also believe that attempting to replicate cluster-lifecycle management on a single node comes at an inappropriate cost (via dependencies).We propose the deprecation of
standalone-cluster
in favor of introducinglocal
(clusters).High-level implementation details
In Tanzu, a Management Cluster does the processing of a
TanzuKubernetesRelease
(TKR). It uses the TKR to determine how to create a workload cluster.In the
local
model, we'll move the management cluster’s TKR processing client-side. After processing the TKR, we have all the information needed to create a Tanzu [workload] cluster that looks similar to that of one created by a management-cluster. See the following depiction of this relationship.As seen above, after parsing the TKR (client-side) and understanding properties of the to-be-created cluster, we can call into a local provider to create a minimal cluster. By leveraging a provider abstraction (interface), we can insulate ourselves from the underlying details of how the infra/host/desktop-env are created. What matters is we receive a kubeconfig with admin access to the API server.
Our initial provider implementation will be
kind
because it's widely accepted in the Kubernetes community. The following gif demonstrates the bootstrap UX for a local Tanzu cluster.GIFs are broken up to save file-size
Cluster creation:
Cluster init:
Cluster list / deletion:
local
is good for:local
is not meant for:management-cluster
.Additionally, this approach would inherently solve many issues we face today:
cluster-api
andtanzu-framework
.For in-depth implementation details, please see our PR.
Release Plan
0.10.0
will feature this new model alongside the existingstandalone-cluster
model.0.10.0
, the existingstandalone-cluster
model will print a deprecation notice to the user.0.11.0
, we'll remove the existingstandalone-cluster
model.FAQ
This section will be updated as questions come in
standalone-cluster
s are essentially very-limited management clusters with a few components ripped out. For users wanting to test and deploy a single cluster in one of those environments, we encourage simply creating a management-cluster and scheduling workloads to it. This is not our production-ready advise, but can get you the exact functionality (plus some) of the existing standalone-cluster model.The text was updated successfully, but these errors were encountered: