Offer minimal deployment model that supports development and experimentation of Tanzu #2266

joshrosso · 2021-10-18T14:18:19Z

Asks

Read this proposal
Try out the proposed model
Vote 👍 or 👎 on this issue
If you have additional feedback, respond to this issue

We aim to accept or reject this proposal 60 days after opening (12/17/2021).

Proposal

⚠️ The name of this feature is currently being determined, you may see it referred to as local, standalone, and unmanaged-cluster in various places. We have decided to move forward with the name unmanaged-cluster. Please note that references to local or standalone (not standalone-cluster) represent unmanaged-cluster

🚨 This proposal has been partially implemented to help further the conversation around whether we should accept it in this project. Read here for details on how to try it and design details. 🚨

Standalone clusters (SAC) are our attempt to provide workload clusters without the need of a long-running management cluster. With this, we intended to:

Minimize resource requirements for clusters
Reduce on boarding time from download to cluster

To accomplish this, we re-purposed Cluster API and extended TKG-Lib (via tanzu-framework) to create the standalone cluster model. With this model in use for many months we've learned that:

SAC users often wanted a single-node or local cluster.
Users wanting more complex cluster-lifecycle management, need managed clusters.
Our usage of cluster-api and tkg-lib created a poor ux for 1 (above) and a stunted experience for 2 (above).
Maintaining the functionality support in our dependencies has had significant overhead as it's a use case tangential to our dependencies.

We believe that solving 1 is high-value for those using Tanzu. We also believe that attempting to replicate cluster-lifecycle management on a single node comes at an inappropriate cost (via dependencies).

We propose the deprecation of standalone-cluster in favor of introducing local (clusters).

High-level implementation details

In Tanzu, a Management Cluster does the processing of a TanzuKubernetesRelease (TKR). It uses the TKR to determine how to create a workload cluster.

In the local model, we'll move the management cluster’s TKR processing client-side. After processing the TKR, we have all the information needed to create a Tanzu [workload] cluster that looks similar to that of one created by a management-cluster. See the following depiction of this relationship.

As seen above, after parsing the TKR (client-side) and understanding properties of the to-be-created cluster, we can call into a local provider to create a minimal cluster. By leveraging a provider abstraction (interface), we can insulate ourselves from the underlying details of how the infra/host/desktop-env are created. What matters is we receive a kubeconfig with admin access to the API server.

Our initial provider implementation will be kind because it's widely accepted in the Kubernetes community. The following gif demonstrates the bootstrap UX for a local Tanzu cluster.

GIFs are broken up to save file-size

Cluster creation:

Cluster init:

Cluster list / deletion:

local is good for:

Workstation use cases of Tanzu
Minimal (non-prod) resourced use cases of Tanzu
CI/CD validations atop Tanzu
Validation of newly created TKRs

local is not meant for:

Cluster-lifecycle simulation
- For this, use management-cluster.

Additionally, this approach would inherently solve many issues we face today:

Resource utilization would be drastically reduced.
WSL/Docker Desktop could be natively supported.
Workstation/Host restarts would be supported.
Added support for running multiple clusters.
Testing of different component versions (e.g. kapp-controller) would be possible.
We would delete our fork of cluster-api and tanzu-framework.

For in-depth implementation details, please see our PR.

Release Plan

0.10.0 will feature this new model alongside the existing standalone-cluster model.
In 0.10.0, the existing standalone-cluster model will print a deprecation notice to the user.
In 0.11.0, we'll remove the existing standalone-cluster model.

FAQ

This section will be updated as questions come in

Q: What about non-local standalone-clusters (AWS, Azure, vSphere)
- A: Execution of this proposal will cause a gap for AWS, Azure, and vSphere as there will no longer be non-managed clusters available. However, standalone-clusters are essentially very-limited management clusters with a few components ripped out. For users wanting to test and deploy a single cluster in one of those environments, we encourage simply creating a management-cluster and scheduling workloads to it. This is not our production-ready advise, but can get you the exact functionality (plus some) of the existing standalone-cluster model.

The text was updated successfully, but these errors were encountered:

dims · 2021-10-18T14:33:46Z

❤️ this! +1 to kind

vrabbi · 2021-10-18T15:18:04Z

This would be great. I think the issie of host restarts will remain in multi node local clusters but could be solved this way with single node clusters.
The only worry is divergence from CAPI and utilizing a different bootstrapping mechanism makes it further from the standars tanzu deployment and could mean somethings eont work the same. For example CAPI may set via CABPK some defaults in terms of encryption,ciphers,feature flags, etc. That kind may not. By doing thay we cant anymore commit to the same things working on a local clister as on a managed cluster unless im missing something

joshrosso · 2021-10-18T15:49:59Z

The only worry is divergence from CAPI and utilizing a different bootstrapping mechanism makes it further from the standars tanzu deployment and could mean somethings eont work the same. For example CAPI may set via CABPK some defaults in terms of encryption,ciphers,feature flags, etc. That kind may not. By doing thay we cant anymore commit to the same things working on a local clister as on a managed cluster unless im missing something

Great point. There are ways to translate some of these things to kind (or other providers brought in by the local model). For example, feature flags. My intuition and hope is that customization in CABPK won't fundamentally change the behavior of workload clusters such that it makes packages, etc work significantly differently.

However, if it did, we could parse these kubeadm customization(s) locally and ensure their behaviors propagate into the underlying provider.

jorgemoralespou · 2021-10-18T16:39:38Z

Here are my comments:
Pros:

Speed of provisioning
Seems to work
Provides an easy way to support multiple local clusters and restart (although doesn't work today)

Cons:

Not aligned with rest of ClusterAPI way of doing things (although not sure this is a pure Con)
Does not allow to create remote "standalone" clusters, which many developers might want/need, as they would probably not do management clusters (too complicated) and reusing the management-cluster for regular use does not work easily because of difference in controllers, etc...

Things that would be important for developers (or that type of non knowledgeable users):

Being able to start/stop (lifecycle) a cluster added to create, delete, list
Being able to reconfigure the cluster in case of IP roam. If I create a cluster and the IP of my machine changes, I want the cluster to still work.
Ability to inject a trusted CA to the nodes (to create trust with the host by reusing a host created/personal CA)
Listing in a verbose mode (so one can see status within the lifecycle of a cluster)
Ability to easily switch contexts between running clusters
Ability to disable those Kind icons at the beginning of each output line
Easy definition/configuration of the clusters, since no longer adheres to ClusterAPI config
Ability to modify the host to access the nodes/ingress via a DNS.

Overall I like this approach, even if it removes some of the benefits "standalone" cluster had given how hard implementing local controller via kcp seemed, although I would have preferred that route to align more with ClusterAPI and TKG/Tanzu.

qnetter · 2021-10-18T16:45:54Z

I'm having a slow day - what are the benefits over creating a management cluster via CAPD and scheduling workloads on it?

jorgemoralespou · 2021-10-18T16:57:32Z

I have seen some:

speed, 10x faster to have a cluster
resources required, as you're only creating one cluster at any time (versus 2 for standalone, the bootstrap one and the standalone one)
support many local customizations that CAPD does not provide as it was mostly only designed for testing

I'm sure there are more.

CAPD is not a proper infrastructure provider in ClusterAPI as it was designed for a single purpose of unit tests. Code would need to be modified to be a proper ClusterAPI infrastructure provider. There's a lot of technical debt that make things harder. I guess that's the biggest hidden benefit.

joshrosso · 2021-10-18T17:00:59Z

Exactly what @jorgemoralespou said, plus:

To respect TKRs and spin up a more realistic Tanzu workload cluster, you still need to create a workload cluster from the management cluster
- So this means you need a bootstrap cluster + mgmt cluster + workload cluster to get to where this proposal gets.
Regarding speed and resources: one of the biggest drivers for this change is how resource intensive CAPD can be. As such, we've helped countless users with bootstrapping issues, which are often rooted in resource constraints.

qnetter · 2021-10-18T17:01:16Z

I'm pretty sure the lack of standalone clusters or the equivalent, especially given the no-reboot limitation, on other providers is not a problem. Do we have a time and resource comparison? I understand the concepts but I'm wondering 10x what :)

jorgemoralespou · 2021-10-18T17:02:55Z

3 minutes (a local cluster) versus 30 minutes (a standalone cluster) in my machine

jorgemoralespou · 2021-10-18T17:04:47Z

I personally don't like Kind and the fact that this proposal misaligns from ClusterAPI, but given the huge different in experience, I have never wanted to use a CAPD standalone cluster but I will definitely use local clusters.

joshrosso · 2021-10-18T17:07:42Z

I'm pretty sure the lack of standalone clusters or the equivalent, especially given the no-reboot limitation, on other providers is not a problem. Do we have a time and resource comparison? I understand the concepts but I'm wondering 10x what :)

Don't take this data as scientific, but here's what I got on a very old 2 core linux box (running a bunch of random stuff)

cluster bootstrap (includes installing the tce user managed repo): 2m37.770s
- unless optimized, used to be 10-30min
cluster delete: 0m2.967s
- unless optimized, used to be 10-30min

vrabbi · 2021-10-18T17:22:05Z

Also by using kind which has support for things other than docker such as podman it opens capabilities to do such an integration into local clusters if the whole docker desktop licensing thing becomes an issue and people move away from it. Linux isnt an issue in that regard but mac and windows users which i believe would be the vast majority of use cases for TCE local clusters could be benefited by supporting a different container runtime for running the cluster itself.

joshrosso · 2021-10-18T17:29:41Z

I agree. This also speaks to why it's important we get the provider interface right. Beyond kind, we could support a variety of underlying models, as long as post-cluster create, we can get passed back an admin kubeconfig.

randomvariable · 2021-10-18T17:34:50Z

Strong agree with this. If I were a workload developer, I'm less concerned with simulating cluster lifecycle, and testing high availability aspects of the workload are more likely to depend on the attributes of the particular cloud I'm deploying to (AZs, storage etc...), which CAPD isn't a good enough approximation of to be useful.

nrb · 2021-10-19T19:24:46Z

@jorgemoralespou Can you say more about this? What differences specifically?

reusing the management-cluster for regular use does not work easily because of difference in controllers, etc...

My understanding is the difference between a management cluster and a workload cluster is 3-5 controllers running in the management cluster for the CAPI information. Other than that, I thought they were identical.

Coming from my kubernetes app development background, this would have been very helpful for testing locally against a Kubernetes API server. It would have been less useful for certain constructs (backing up volume data with Velero), but as mentioned above, that was often cloud platform dependent in any case.

jorgemoralespou · 2021-10-19T20:00:53Z

My understanding is the difference between a management cluster and a workload cluster is 3-5 controllers running in the management cluster for the CAPI information. Other than that, I thought they were identical.

Management cluster does also have a couple of controllers that install packages on the workload-cluster (addon-manager) and (capabilities-manager) if I'm not mistaken. That's one of the reasons why standalone clusters and workload clusters have differences to upgrade kapp-controller as an example.

vincepri · 2021-10-19T20:22:30Z

Have we thought about creating a Kind provider for Cluster API rather than trying to replicate the lifecycle model?

joshrosso · 2021-10-19T23:52:24Z

rather than trying to replicate the lifecycle model?

Where do you feel this proposal is replicating the lifecycle model?

The proposal's intent was to say that, we don't need a lifecycle model. We just need to bootstrap a cluster on a single node.

For those reading this proposal, we're largely advocating to stay out of the cluster lifecycle problem domain.

On a technical level, our implementation/proposal calls an API equivalent to cluster create.

What that API invokes under the hood can be anything. For example it can:

Call kind (
- note: this is our default/ref implementation because it requires next to no resources, no bootstrap cluster, is widely adopted and is very fast.
Call CAPD
Call automation for kvm/esxi/fusion/etc

Once the thing managing the lifecycle finishes bootstrapping the cluster, we receive a kubeconfig back.

Then, that's when this plugin really steps in to do its work. And it makes its decisions on what to do on the cluster based on the declaration of the distribution (which exists in the TKR).

Hope this helps, but please let me know if there's overlap I'm not seeing.

vincepri · 2021-10-20T02:58:34Z

Thanks for the added context @joshrosso, the above makes.

When I started reading through the issue from the problem statement it seems that the issues were mostly around speed of standalone cluster execution, have we explored avenues that would help both tkg-lib and cluster-api to create a CAPD (or similar) based cluster in the minimum amount of time possible?

What's the general role of a standalone/local cluster for our users? From our docs:

This enables our users to try out many projects and technology in the Tanzu portfolio with a reduced barrier of entry.

Is a kind cluster enough for all use cases? What are the implications of not having an active management-workload cluster in this case? If we don't need a lifecycle model, would we never need having access to Cluster API primitives like Cluster, ClusterClass soon, or MachineDeployments?

randomvariable · 2021-10-20T16:47:40Z

What are the implications of not having an active management-workload cluster in this case? If we don't need a lifecycle model, would we never need having access to Cluster API primitives like Cluster, ClusterClass soon, or MachineDeployments?

The way I see this, is that this is mostly about the local application development workflow for your average business unit appdev. Having the ability to provision a local kind/minikube whatever cluster locally as fast as possible and deploying some Tanzu addons to give it a Tanzu look and feel. In these instances, we're not really concerned around a full lifecycle model I think.

I think however, maintaining clusterctl save/restore and some of the use cases from the existing standalone cluster where CAPI does the provisioning is still useful for everything which isn't "i need a tanzu flavoured k8s on my laptop right now and don't eat all my RAM"

timothysc · 2021-10-20T18:55:49Z

I like the idea. I think we should update the docs to outline the user stories of when you would use (A) vs. (B). There may be some gotchas around conf parameters, but as we tinker we'll know more. e.g. Will pinniped, contour, etc. just work?

vincepri · 2021-10-20T19:04:32Z

This all makes sense, thanks folks — it's definitely good to have more context, appreciate all the responses

randomvariable · 2021-10-20T19:16:23Z

Will pinniped, contour, etc. just work

I can see the need to get Contour working in local clusters, but I don't think Pinniped is going to be that useful since the persona this is intended for isn't going to have permissions or the desire to hook up their local dev cluster to an IdP.

joshrosso · 2022-01-04T17:23:00Z

A few updates:

The decided on name is unmanaged-cluster.
This proposal is approved.
In our 0.10.0 release, this model will be available alongside the existing standalone-cluster model
The existing standalone-cluster model output a deprecation warning to users.

garrying · 2022-01-04T22:01:07Z

Minor feedback/thought on the terminal experience of the proof-of-concept given it is introducing newish output patterns: The secondary text could get difficult to read depending on the minimum contrast config of the terminal and users' color scheme. For example, on Solarize, the text is barely visible.

Related: #2730 where we're starting to think about improving visibility of processes.

stmcginnis · 2022-01-04T22:07:21Z

The color formatting looks great... when we are running in a terminal theme that fits well with it. But it's hard to guarantee that, so I think we should either see if we can find some way to query the terminal to get color recommendations based on the theme, or we should just go with the default color and just use indentation to make it easier to read.

There may also be some color-blind concerns with the way we are doing it now as well.

joshrosso · 2022-01-04T22:16:00Z

or we should just go with the default color and just use indentation to make it easier to read.

^this. The indentation is adequate.

jpmcb · 2022-01-06T20:40:17Z

A few thoughts on the color / contrast problem:

We should follow the $NO_COLOR env variable standard and disable colors if that is present: https://no-color.org/
We should also check for if the terminal doesn't support colors. We should be able to inspect this via $TERM == dumb
And we should check if the terminal is a TTY terminal (we might already be doing this ..)
It wouldn't be a bad idea to introduce a color library so that we aren't using raw escape characters to create colors in our code. Something like: https://github.com/fatih/color
- I don't think there's a way to inspect the terminals color palate and dynamically set the colors based on what the user is using. But I believe using a color library would enable us to set common sets of colors that should be supported by most color themes (or at least the popular ones) and avoid some of the poor contrast issues we've created. I agree the indentation is adequate for understanding the UX flow and if someone really doesn't want the colors, they could set the flag in the command or use $NO_COLOR (when/if we implement that)
All in all however, this shouldn't block this getting out to users since it's possible to bypass the colors entirely. All of the above would be user experience enhancements to an already working UX.

kartiklunkad26 · 2022-01-10T22:48:07Z

How would the docs look like for unmanaged-cluster in the context of existing documentation? Haven't really seen anything for docs in the proposal?

kcoriordan · 2022-01-11T11:29:34Z

I'm looking at this today, and tracking here: #2808

garrying · 2022-01-18T16:35:48Z

The links to proposed model are pointing to an empty README. For posterity, here's a link to the original README contents: https://github.com/vmware-tanzu/community-edition/blob/db06202fdd79271e4b5e80a0aa76387ca78917f0/cli/cmd/plugin/standalone-cluster/README.md

joshrosso added kind/feature A request for a new feature proposal/pending Capability has not yet been accepted by TCE project. Work should not start until accepted. labels Oct 18, 2021

joshrosso added this to the v0.10.0 milestone Oct 18, 2021

joshrosso mentioned this issue Oct 18, 2021

Introduce tanzu local model and deprecate tanzu standalone-cluster #2257

Closed

joshrosso changed the title ~~Replace tanzu standalone-cluster model with tanzu local model~~ Proposal: Replace tanzu standalone-cluster model with tanzu local model Oct 18, 2021

jpmcb mentioned this issue Oct 20, 2021

Implements adding values files for a image bundle #2283

Merged

joshrosso modified the milestones: v0.11.0, v0.10.0 Jan 4, 2022

joshrosso mentioned this issue Jan 4, 2022

Rename new standalone to unmanged-cluster #2704

Closed

joshrosso added proposal/acccepted Change is accepted and removed proposal/pending Capability has not yet been accepted by TCE project. Work should not start until accepted. labels Jan 10, 2022

kcoriordan mentioned this issue Jan 11, 2022

Doc - Remove standalone-cluster docs and add doc for unmanaged-cluster #2808

Closed

joshrosso changed the title ~~Proposal: Introduce a new minimal bootstrap option to replace the existing standalone-cluster plugin~~ Introduce a minimal deployment model that supports development and experimentation of Tanzu Jan 14, 2022

joshrosso mentioned this issue Jan 14, 2022

🔭 Engineering Roadmap #1293

Closed

joshrosso changed the title ~~Introduce a minimal deployment model that supports development and experimentation of Tanzu~~ Offer minimal deployment model that supports development and experimentation of Tanzu Jan 14, 2022

kcoriordan mentioned this issue Jan 18, 2022

Doc - Remove standalone cluster doc #2862

Merged

jpmcb mentioned this issue Jan 18, 2022

Remove standalone-cluster diagnostics collection in v0.11.0 #2864

Closed

joshrosso closed this as completed in #2376 Jan 18, 2022

This was referenced Feb 1, 2022

tanzu standalone-cluster completion bash --help missing tanzu part of command #1587

Closed

Remove standalone-cluster in v0.11.0 #3005

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer minimal deployment model that supports development and experimentation of Tanzu #2266

Offer minimal deployment model that supports development and experimentation of Tanzu #2266

joshrosso commented Oct 18, 2021 •

edited

Loading

dims commented Oct 18, 2021 •

edited

Loading

vrabbi commented Oct 18, 2021

joshrosso commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021

qnetter commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021

joshrosso commented Oct 18, 2021

qnetter commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021 •

edited

Loading

joshrosso commented Oct 18, 2021

vrabbi commented Oct 18, 2021

joshrosso commented Oct 18, 2021

randomvariable commented Oct 18, 2021 •

edited

Loading

nrb commented Oct 19, 2021

jorgemoralespou commented Oct 19, 2021

vincepri commented Oct 19, 2021

joshrosso commented Oct 19, 2021 •

edited

Loading

vincepri commented Oct 20, 2021 •

edited

Loading

randomvariable commented Oct 20, 2021 •

edited

Loading

timothysc commented Oct 20, 2021

vincepri commented Oct 20, 2021

randomvariable commented Oct 20, 2021

joshrosso commented Jan 4, 2022 •

edited

Loading

garrying commented Jan 4, 2022

stmcginnis commented Jan 4, 2022

joshrosso commented Jan 4, 2022

jpmcb commented Jan 6, 2022

kartiklunkad26 commented Jan 10, 2022

kcoriordan commented Jan 11, 2022

garrying commented Jan 18, 2022

Offer minimal deployment model that supports development and experimentation of Tanzu #2266

Offer minimal deployment model that supports development and experimentation of Tanzu #2266

Comments

joshrosso commented Oct 18, 2021 • edited Loading

Asks

Proposal

High-level implementation details

Cluster creation:

Cluster init:

Cluster list / deletion:

Release Plan

FAQ

dims commented Oct 18, 2021 • edited Loading

vrabbi commented Oct 18, 2021

joshrosso commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021

qnetter commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021

joshrosso commented Oct 18, 2021

qnetter commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021

jorgemoralespou commented Oct 18, 2021 • edited Loading

joshrosso commented Oct 18, 2021

vrabbi commented Oct 18, 2021

joshrosso commented Oct 18, 2021

randomvariable commented Oct 18, 2021 • edited Loading

nrb commented Oct 19, 2021

jorgemoralespou commented Oct 19, 2021

vincepri commented Oct 19, 2021

joshrosso commented Oct 19, 2021 • edited Loading

vincepri commented Oct 20, 2021 • edited Loading

randomvariable commented Oct 20, 2021 • edited Loading

timothysc commented Oct 20, 2021

vincepri commented Oct 20, 2021

randomvariable commented Oct 20, 2021

joshrosso commented Jan 4, 2022 • edited Loading

garrying commented Jan 4, 2022

stmcginnis commented Jan 4, 2022

joshrosso commented Jan 4, 2022

jpmcb commented Jan 6, 2022

kartiklunkad26 commented Jan 10, 2022

kcoriordan commented Jan 11, 2022

garrying commented Jan 18, 2022

joshrosso commented Oct 18, 2021 •

edited

Loading

dims commented Oct 18, 2021 •

edited

Loading

jorgemoralespou commented Oct 18, 2021 •

edited

Loading

randomvariable commented Oct 18, 2021 •

edited

Loading

joshrosso commented Oct 19, 2021 •

edited

Loading

vincepri commented Oct 20, 2021 •

edited

Loading

randomvariable commented Oct 20, 2021 •

edited

Loading

joshrosso commented Jan 4, 2022 •

edited

Loading