Adopt existing AKS clusters #1173

nprokopic · 2021-02-11T16:13:27Z

Goals

Support in CAPZ to adopt an existing AKS cluster, by creating required CAPI and CAPZ objects that describe it.

Non-Goals/Future Work

Support in CAPZ to adopt an existing Kubernetes cluster that was created outside of CAPZ (with infrastructure similar to how it's created now with AzureCluster and other related objects), by creating required CAPI and CAPZ objects that describe it.

User Story

As a user of Cluster API Azure, I would like to start managing my existing AKS clusters in a CAPZ management cluster, by creating CAPI and CAPZ objects that are describing my existing clusters, so that I don't have to create new clusters and migrate my workload, as that could cause significant downtime, or it might be even impossible.

Detailed Description

For an existing AKS cluster, that is using preferably only features that are supported by CAPZ, I want to create CAPI (Cluster, MachinePool) and CAPZ (AzureManagedCluster, AzureManagedControlPlane, AzureManagedMachinePool) objects that I will submit to management cluster. CAPZ controller will start reconciling CAPZ objects and it will gracefully handle the existence of an already created AKS cluster and it will treat it as it was created by CAPZ.

It would be useful to have an e2e test for this scenario.

Contract changes [optional]

It is possible that some changes in CAPZ managed cluster object defaulting and validation would be required, for example setting default SSH key in AzureManagedControlPlane is causing issues in my current AKS cluster adoption experiments, as Azure API is returning error Changing property 'linuxProfile.ssh.publicKeys.keyData' is not allowed.. I will create a separate issue for this.

Data model changes [optional]

It is possible that some changes in managed cluster CRDs would be required (e.g. to solve the above issue with SSH keys, or in other similar cases).

/kind proposal

The text was updated successfully, but these errors were encountered:

nprokopic · 2021-02-11T16:53:26Z

Current blockers for this:

OS type not explicitly specified for the managed node pool, described here OS type is not set for managed node pool when calling Azure API #1174
SSH key update error

"error creating AzureManagedControlPlane default/aks6adopt: failed to reconcile managed cluster: failed to reconcile managed cluster aks6adopt: failed to create or update managed cluster, failed to begin operation: containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"PropertyChangeNotAllowed\" Message=\"Changing property 'linuxProfile.ssh.publicKeys.keyData' is not allowed.\" Target=\"linuxProfile.ssh.publicKeys.keyData\"" "controller"="azuremanagedcontrolplane" "name"="aks6adopt" "namespace"="default"

I had my public key configured here, but same would probably happen even without it, as the defaulting would set a newly generated key if one was not set already. This should be solved by checking LinuxProfile.SSH on existing cluster object (which is already fetched in code) and then not try to change the values.

Possibly if AKS cluster is initially created with an SSH key, and then if that same SSH key is set in AzureManagedControlPlane, this error might not happen. Need to check this.

CecileRobertMichon · 2021-02-12T17:13:57Z

cc @alexeldeib

nprokopic · 2021-02-12T19:10:54Z

About SSH keys, in theory it might work if the AKS cluster was created with a single SSH key, and that same key is set in the CR.

I didn't make it work yet, I'll update here after some more testing (mid next week, as I'm AFK in the beginning of the week).

nprokopic · 2021-02-12T20:17:30Z

Some concrete actions here would be to add an e2e test for adopting an existing AKS cluster with features that are currently supported in CAPZ. We can the iterate from there by adding more AKS features to CAPZ (I would happily submit a proposal for what to add next, or pick up some existing issue) and updating e2e test to cover those.

(Before any of these, we probably need at least some basic e2e test for AKS cluster creation, as I understood Cecile there is none atm.)

alexeldeib · 2021-04-26T23:12:28Z

somehow I missed this one. looks like #1175 fixed part of the blocker. re: SSH keys, it should work without change. i'd be interested to know if there are any blockers here for the adoption piece. we definitely want e2e either way.

k8s-triage-robot · 2021-07-27T04:20:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

alexeldeib · 2021-07-27T07:40:23Z

/remove-lifecycle stale
/help-wanted

we've added e2e since I made the previous comment. If someone could manually test this (easy), that would be enough to close this out IMO.

we could also add a separate e2e for a pre-created AKS cluster, but I'm not sure how much value that delivers relative to the work vs. polishing up other parts of the implementation.

nprokopic · 2021-08-10T13:25:55Z

we've added e2e since I made the previous comment. If someone could manually test this (easy), that would be enough to close this out IMO.

I have tested this some time ago, I believe when I opened this issue and I got it working after fixing #1175.

we could also add a separate e2e for a pre-created AKS cluster, but I'm not sure how much value that delivers relative to the work vs. polishing up other parts of the implementation.

Having an e2e that ensures it is working would be useful (and also some docs describing how it works), since from what I saw even small subtle changes can break this, e.g. managed cluster creation works the same with or without #1174 (it does not need the default field value to be set), but it was required for adopting existing managed cluster.

k8s-triage-robot · 2021-11-08T14:01:38Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-12-08T14:49:54Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-01-07T15:13:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-07T15:13:14Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CecileRobertMichon · 2022-01-10T19:08:17Z

/reopen

k8s-ci-robot · 2022-01-10T19:08:31Z

@CecileRobertMichon: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-02-09T19:41:02Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

CecileRobertMichon · 2022-12-08T21:35:21Z

/remove-lifecycle rotten
/lifecycle frozen

kisahm · 2023-01-06T10:59:13Z

+1

any news on this topic? are there some ongoing activities?

CecileRobertMichon · 2023-01-06T17:55:08Z

@kisahm I don't believe anyone is currently working on this but @mtougeron mentioned this topic a while back at CAPI office hours

cc @jackfrancis @nojnhuh

ericgoode · 2023-04-26T21:26:12Z

I saw a note above that said that he had this working. Is there any way to find out what was done to get it working?

mtougeron · 2023-05-01T22:15:04Z

I'm writing this up today. I should have something to share for you soon.

ericgoode · 2023-05-18T21:12:01Z

It looks like in 1.9.2 I can adopt a CAPI created cluster by just reapplying the generated config files and it figures it all out. I am going to test doing a az cli created cluster as well.

mtougeron · 2023-05-19T12:43:59Z

There are ways to do it, I've written up my notes here but there are some gotchas in the process depending on how you created the original cluster. I'm working with @dtzar to try and figure out if we can adjust capz or call those out better. Some of them might be resolved when capz is refactored to use ASO as well.

dtzar · 2023-08-02T19:01:12Z

One of the challenges here is that if you generate or create a CAPI/CAPZ YAML file for a cluster where features do not exist yet OR when you submit that YAML to the cluster and there is a mutating webhook, at this point the source of truth (CAPZ YAML) does not match what is deployed. See kubernetes-sigs/cluster-api#8668

The big question is: can we have an officially supported feature which allows this? We'd need to work through the potential dangers/pitfalls. e.g. CAPZ yaml feature is now supported for AKS and it does not match the configuration of the AKS feature which is already deployed on the cluster. How does this get handled?

Tangent related - this would be possible today in pure ASO using their asoctl command.

dtzar · 2023-09-25T17:59:52Z

Posting this related project for reference: https://github.com/tobiasgiese/cluster-api-migration

nojnhuh · 2024-02-12T19:51:02Z

/assign

nojnhuh · 2024-03-12T17:30:20Z

I followed @mtougeron's steps here that he linked above by cheating a bit and creating a CAPZ AKS cluster, orphaning it (remove it from CAPZ management but leave the resources running in Azure), and reapplying the same templates and the cluster gets adopted fine AFAICT.

I'll work on documenting this and adding an e2e test but am not anticipating needing any other changes to enable this.

dtzar · 2024-03-12T21:07:17Z

@nojnhuh There is a case where there might be a mutating webhook though, correct? So if that was in play this would fail, no?

nojnhuh · 2024-03-12T21:16:51Z

@dtzar AKS features that are unknown to CAPZ won't ultimately get set in the ASO YAML so those won't be disabled. And if we make more features available in CAPZ and ensure they default to null, then a CAPZ upgrade managing an adopted cluster shouldn't claim control over those fields automatically.

And the mutating webhook is always in play, but if you craft the CAPZ YAML such that the defaulting either does nothing or does what you want, then I don't see where that would be causing any issues.

k8s-ci-robot added the kind/proposal Issues or PRs related to proposals. label Feb 11, 2021

nprokopic mentioned this issue Feb 11, 2021

OS type is not set for managed node pool when calling Azure API #1174

Closed

nprokopic mentioned this issue Feb 12, 2021

Set OS type Linux for managed node pool when submitting Azure API request #1175

Merged

3 tasks

CecileRobertMichon closed this as completed Feb 12, 2021

CecileRobertMichon reopened this Feb 12, 2021

fiunchinho mentioned this issue Feb 17, 2021

Adopt existing clusters for KaaS management giantswarm/rfc#7

Closed

CecileRobertMichon added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Mar 16, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021

nprokopic mentioned this issue Aug 6, 2021

REQUEST: New membership for nprokopic kubernetes/org#2879

Closed

7 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 8, 2021

k8s-ci-robot closed this as completed Jan 7, 2022

k8s-ci-robot reopened this Jan 10, 2022

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 8, 2022

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Dec 8, 2022

bhegazy mentioned this issue Feb 15, 2023

Adopt existing EKS clusters kubernetes-sigs/cluster-api-provider-aws#4068

Open

CecileRobertMichon added this to CAPZ Planning Apr 5, 2023

nojnhuh mentioned this issue Aug 22, 2023

Expose AKS preview features #2625

Closed

dtzar mentioned this issue Sep 25, 2023

Feature: Allow users to specify how they want to adopt an existing resource Azure/azure-service-operator#2703

Open

dtzar added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 28, 2023

dtzar moved this to Todo in CAPZ Planning Sep 28, 2023

k8s-ci-robot assigned nojnhuh Feb 12, 2024

nojnhuh mentioned this issue Feb 29, 2024

use existing virtualNetwork from a different rg #4606

Merged

4 tasks

nojnhuh mentioned this issue Apr 1, 2024

add AKS adoption e2e test and docs #4697

Merged

4 tasks

dtzar added this to the v1.15 milestone Apr 4, 2024

k8s-ci-robot closed this as completed in #4697 Apr 26, 2024

github-project-automation bot moved this from Todo to Done in CAPZ Planning Apr 26, 2024

nojnhuh mentioned this issue May 2, 2024

ASOAPI: basic automated adoption #4799

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt existing AKS clusters #1173

Adopt existing AKS clusters #1173

nprokopic commented Feb 11, 2021

nprokopic commented Feb 11, 2021

CecileRobertMichon commented Feb 12, 2021

nprokopic commented Feb 12, 2021

nprokopic commented Feb 12, 2021

alexeldeib commented Apr 26, 2021

k8s-triage-robot commented Jul 27, 2021

alexeldeib commented Jul 27, 2021

nprokopic commented Aug 10, 2021

k8s-triage-robot commented Nov 8, 2021

k8s-triage-robot commented Dec 8, 2021

k8s-triage-robot commented Jan 7, 2022

k8s-ci-robot commented Jan 7, 2022

CecileRobertMichon commented Jan 10, 2022

k8s-ci-robot commented Jan 10, 2022

k8s-triage-robot commented Feb 9, 2022

CecileRobertMichon commented Dec 8, 2022

kisahm commented Jan 6, 2023

CecileRobertMichon commented Jan 6, 2023

ericgoode commented Apr 26, 2023

mtougeron commented May 1, 2023

ericgoode commented May 18, 2023

mtougeron commented May 19, 2023

dtzar commented Aug 2, 2023

dtzar commented Sep 25, 2023

nojnhuh commented Feb 12, 2024

nojnhuh commented Mar 12, 2024

dtzar commented Mar 12, 2024

nojnhuh commented Mar 12, 2024

Adopt existing AKS clusters #1173

Adopt existing AKS clusters #1173

Comments

nprokopic commented Feb 11, 2021

nprokopic commented Feb 11, 2021

CecileRobertMichon commented Feb 12, 2021

nprokopic commented Feb 12, 2021

nprokopic commented Feb 12, 2021

alexeldeib commented Apr 26, 2021

k8s-triage-robot commented Jul 27, 2021

alexeldeib commented Jul 27, 2021

nprokopic commented Aug 10, 2021

k8s-triage-robot commented Nov 8, 2021

k8s-triage-robot commented Dec 8, 2021

k8s-triage-robot commented Jan 7, 2022

k8s-ci-robot commented Jan 7, 2022

CecileRobertMichon commented Jan 10, 2022

k8s-ci-robot commented Jan 10, 2022

k8s-triage-robot commented Feb 9, 2022

CecileRobertMichon commented Dec 8, 2022

kisahm commented Jan 6, 2023

CecileRobertMichon commented Jan 6, 2023

ericgoode commented Apr 26, 2023

mtougeron commented May 1, 2023

ericgoode commented May 18, 2023

mtougeron commented May 19, 2023

dtzar commented Aug 2, 2023

dtzar commented Sep 25, 2023

nojnhuh commented Feb 12, 2024

nojnhuh commented Mar 12, 2024

dtzar commented Mar 12, 2024

nojnhuh commented Mar 12, 2024