Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate resources are created when template is used both as controlPlane ref and infrastructure ref in ClusterClass #6126

Closed
pydctw opened this issue Feb 14, 2022 · 25 comments · Fixed by #6988
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pydctw
Copy link

pydctw commented Feb 14, 2022

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

While doing a PoC of creating an EKS cluster using ClusterClass (CAPI + CAPA), I noticed that two AWSManangedControlPlane (awsmcp) are created from AWSManangedControlPlaneTemplate (awsmcpt) when there should be only one awsmcp. For context, EKS uses an AWS managed control plane so AWSManangedControlPlane in CAPA is the counterpart of KubeadmControlPlane in CAPI.

$ kubectl get cluster -n eks
NAME                PHASE         AGE     VERSION
my-eks-cluster-2    Provisioned   2d20h   v1.21.2
$ kubectl get awsmcp -n eks
NAME                              CLUSTER             READY   VPC                     BASTION IP
my-eks-cluster-2-84g7h            my-eks-cluster-2    true    vpc-0fd430763e64830ee
my-eks-cluster-2-c4vqb            my-eks-cluster-2    true    vpc-06b20d0d9d5eae93d

Further debugging suggests that this is due to the fact AWSManagedControlPlaneTemplate is used both asspec.controlPlane.ref and spec.infrastructure.ref in ClusterClass and CAPI controller clones them twice. FYI, there is no AWSManagedCluster type in CAPA.

ClusterClass used for EKS.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: eks-clusterclass-v1
  namespace: eks
spec:
  controlPlane:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: AWSManagedControlPlaneTemplate
      name: eks-clusterclass-v1-awsmcp
  infrastructure:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: AWSManagedControlPlaneTemplate
      name: eks-clusterclass-v1-awsmcp
  workers:
    machineDeployments:
      - class: default-worker
        template:
          bootstrap:
            ref:
              apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
              kind: EKSConfigTemplate
              name: eks-clusterclass-v1-eksconfigtemplate
          infrastructure:
            ref:
              apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
              kind: AWSMachineTemplate
              name: eks-clusterclass-v1-worker-machinetemplate

This is creating an extra EKS control plane and related infrastructure in AWS and causing panics in CAPA controller.

What did you expect to happen:
Only one AWSManangedControlPlane is created.

Anything else you would like to add:

Environment:

  • Cluster-api version: v1.1.1
  • Minikube/KIND version: v0.11.1
  • Kubernetes version: (use kubectl version): v1.21.2
  • OS (e.g. from /etc/os-release): MacOS

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 14, 2022
@pydctw
Copy link
Author

pydctw commented Feb 14, 2022

cc @sbueringer

@pydctw pydctw changed the title Duplicate resources are created when they are used both as controlPlane ref and infrastructure ref in ClusterClass Duplicate resources are created when template is used both as controlPlane ref and infrastructure ref in ClusterClass Feb 14, 2022
@sbueringer
Copy link
Member

It makes sense that the issue occurs, as we create instances of control plane and infrastructure cluster and don't handle the case that both could/should be the same.

Assuming it's a supported case we have to adjust our "core" topology reconciler to handle that case and I think it also has impact on patches.

@fabriziopandini
Copy link
Member

@yastij handling managed provider is another area where we have inconsistency cross providers:

  • in CAPA we are using the same AWSManagedControlPlaneTemplate object both as a control plane object and as cluster infrastructure object
  • in CAPZ there are two different objects, one for the managed control plane object and one for cluster infrastructure object

Personally, I prefer the second approach, because I find it less confusing, composable, and with a better separation of concerns.

On the same lines, we should also spread awareness about the ClusterClass requirements for supporting cluster upgrades, currently not implemented by any manged provider

@pydctw
Copy link
Author

pydctw commented Feb 15, 2022

cc @richardcase for EKS

@richardcase
Copy link
Member

I'm not surprised this turned out to be an issue.

We originally had 2 different kinds AWSManagedCluster and AWSManagedControlPlane. However AWSManagedCluster was only acting as a pass-through to communicate values from the control plane to capi to satisfy the infrastructure cluster contract....which the control plane already did. So we decided to remove it.

I think it's valid for managed services to have the Cluster/Controlplane as the same resource. But i also understand the need for consistency between providers and if we had to re-instate AWSManagedCluster then i'm good with that but.......

.....more generally, I'm not sure that we have ever thought about what CAPI looks like for managed Kubernetes services (please correct me if i'm wrong here)? As a result, providers have made their own decisions and have tried to fit it into the current resource kinds/reconciliation process/provider types. With the CAPG managed implementation starting soon, its probably something we need to discuss and decide on what a managed service looks like in CAPI.

@richardcase
Copy link
Member

@sbueringer @fabriziopandini @pydctw - i'd be happy to help/facilitate any discussions about "what does managed kubernetes in capi look like" if needed.

@pydctw
Copy link
Author

pydctw commented Feb 15, 2022

@sbueringer @fabriziopandini @pydctw - i'd be happy to help/facilitate any discussions about "what does managed kubernetes in capi look like" if needed.

It will be great if we can discuss it during the office hour tomorrow as I am blocked progressing on kubernetes-sigs/cluster-api-provider-aws#3166. Will add it to the agenda.

@richardcase
Copy link
Member

From the office hours it was mentioned that there is an open PR in CAPZ to also go down to 1 type (AzureManagedControlPlane) that satisfies both the cluster and controlplane contracts like we are doing in CAPA: kubernetes-sigs/cluster-api-provider-azure#1468

@richardcase
Copy link
Member

richardcase commented Feb 16, 2022

With managed Kubernetes services the lines are blurred between the cluster infrastructure and the control plane. So a few solutions that have been discussed:

  • Allow the same kind to be used in the references for infrastructureRef and controlPlaneRef (like CAPA is doing and the PR for CAPZ is proposing). But this currently causes problems with clusterclass as documented by this issue
  • Ensure there are separate Cluster and ControlPlane kinds, even if the Cluster kind only acts as a passthrough to satisfy the contract with CAPI. For CAPA this causes some strange interactions required for things like controlplane endpoint.
  • Only supply an infrastructureRef and not supply controlPlaneRef and do all the control plane reconciliation in the infrastructure cluster reconciler. From office hours there was mention that this could cause an issue with clusterclass
  • Potentially use externally managed somehow.

Sounds like we need to get a proposal/doc together that covers the various potential solutions and then decide a consistent way forward for any provider that has a managed kubernetes service.

@fabriziopandini @pydctw - i can make a start on a doc tomorrow and we can all collaborate. How does that sound?

@sbueringer
Copy link
Member

sbueringer commented Feb 16, 2022

Sounds great!

Short comment about:

Only supply an infrastructureRef and not supply controlPlaneRef and do all the control plane reconciliation in the infrastructure cluster reconciler. From office hours there was mention that this could cause an issue with clusterclass

We are currently using the control plane object to trigger/monitor rollouts of the control plane to new Kubernetes versions. Assuming that this should be possible with managed clusters too, it might be easier to go done the road of a ControlPlane without machines/replicas (but yeah it doesn't really fit conceptually).

But I'm aware that on the other side Clusters without control plane are allowed in the Cluster resource and Clusters without infra clusters are not allowed.

@vincepri
Copy link
Member

vincepri commented Feb 17, 2022

But I'm aware that on the other side Clusters without control plane are allowed in the Cluster resource and Clusters without infra clusters are not allowed.

We should probably fix this bit in the future, a Cluster should always have a control plane, but it doesn't have to have an infrastructure associated with it necessarily. See below

In the future we can also think that the infrastructure might go away entirely and instead become something else. Truthfully today the InfraCluster object is a stepping stone, we do need the infrastructure to be setup and configured somehow, but most users might want to have something else manage that (like Terraform, Crossplane, etc) and inform Cluster API where to get those values.

@vincepri
Copy link
Member

/milestone v1.2

@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone Feb 17, 2022
@vincepri
Copy link
Member

Thinking about it a bit more, we should probably meet and define clear responsibilities of reference. Infrastructure, Control Plane, and other references should all have clear delineations.

If we think about the responsibilities of an infrastructure provider and its InfraCluster object, we can assume that this object provides the infrastructure for the Kubernetes Cluster, which can include a number of things and today it also includes the load balancer.

On the other side, the control plane reference is in charge of managing the Kubernetes control plane, the infrastructure should be left to the other reference. The challenge I'm seeing with the above is that it seems that we've mixed some of these responsibilities when it comes to the managed cloud Kubernetes services.

Let's reconvene and chat more about it, we should meet with at least one person from each provider interested in this discussion and push for consistency and separation of responsibilities.

cc @yastij @fabriziopandini @sedefsavas @richardcase @CecileRobertMichon

@richardcase
Copy link
Member

we should probably meet and define clear responsibilities of reference. Infrastructure, Control Plane, and other references should all have clear delineations

Meeting to agree the responsibilities would be great 👍 Also agree that we need to be clear of the delineations and this is the issue we are facing with managed Kubernetes services.....the current delineations don't naturally fit.

InfraCluster object, we can assume that this object provides the infrastructure for the Kubernetes Cluster, which can include a number of things and today it also includes the load balancer.

the control plane reference is in charge of managing the Kubernetes control plane, the infrastructure should be left to the other reference.

This is a good example of why the current responsibilities of a control plane & infrastructure provider don't fit well for managed services like EKS and why we have ended up where we are.

When you create a EKS control plane in AWS (which we do via a CAPI control plane provider) this creates a load balancer automatically for the API server...this is at odds with the current assumption that the infra provider creates the load balancer and reports the controlplane endoint back to CAPI.

So revisiting the provider types and responsibilities in the context of managed Kubernetes services would be great.

When shall we get together? Perhaps a doodle is needed?

@pydctw
Copy link
Author

pydctw commented Apr 20, 2022

Presented Managed Kubernetes in CAPI proposal during CAPI office hrs on 4/20/22

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2022
@richardcase
Copy link
Member

/remove-lifecycle stale

@killianmuldoon
Copy link
Contributor

/area topology

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini
Copy link
Member

/close
this has been fixed by introducing support for server side apply

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
this has been fixed by introducing support for server side apply

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pydctw
Copy link
Author

pydctw commented Jul 29, 2022

This issue is related to #6988 for EKS ClusterClass support, not server side apply. Will keep it open until the proposal is merged.

/reopen

@k8s-ci-robot
Copy link
Contributor

@pydctw: Reopened this issue.

In response to this:

This issue is related to #6988 for EKS ClusterClass support, not server side apply. Will keep it open until the proposal is merged.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jul 29, 2022
@pydctw
Copy link
Author

pydctw commented Jul 29, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2022
@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@sbueringer
Copy link
Member

@pydctw How do we want to treat this core CAPI issue now that the proposal is merged? I assume we have corresponding issues in CAPA so it's fine to close this issue here?

@killianmuldoon killianmuldoon added the area/clusterclass Issues or PRs related to clusterclass label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants