-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eks: Changing securityGroup deletes entire cluster #28584
eks: Changing securityGroup deletes entire cluster #28584
Comments
Yes I think it's possible to just update the cluster security groups without replacing the cluster. At this moment the cluster construct from aws-eks is using customer resource to create the cluster but if we look at SecurityGroupsIds, it's update with no Interruption. Agree we should fix replaceVpc. |
### Issue # (if applicable) Closes #28584 ### Reason for this change To have in place updates for EKS clusters when subnets or SG values are changed. ### Description of changes Removed `replaceVpc` logic and introduced `updateVpc` to track changes and errors to handle multiple updates in one go ### Description of how you validated changes Have tested the changes by first deploying a cluster with below config: ```ts const vpc = ec2.Vpc.fromLookup(stack, 'Vpc', { isDefault: true }); new eks.Cluster(stack, 'Cluster', { vpc, ...getClusterVersionConfig(stack, eks.KubernetesVersion.V1_24), defaultCapacity: 0, }); ``` TestCase - 1 Update both subnets and Access at the same time ```ts new eks.Cluster(stack, 'Cluster', { vpc, ...getClusterVersionConfig(stack, eks.KubernetesVersion.V1_29), defaultCapacity: 0, tags: { foo: 'bar', }, endpointAccess: eks.EndpointAccess.PUBLIC, vpcSubnets: [{ subnetType: ec2.SubnetType.PUBLIC }], }); ``` Error below is thrown for Cluster custom resource - ``` { "errorType": "Error", "errorMessage": "Only one type of update - VpcConfigUpdate, LoggingUpdate or EndpointAccessUpdate can be allowed", "stack": [ "Error: Only one type of update - VpcConfigUpdate, LoggingUpdate or EndpointAccessUpdate can be allowed", " at Pi.onUpdate (/var/task/index.js:55:651127)", " at Pi.onEvent (/var/task/index.js:55:647590)", " at Runtime.yR [as handler] (/var/task/index.js:55:657995)", " at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1173:29)" ] } ``` TestCase - 2 Update subnets to public ```ts new eks.Cluster(stack, 'Cluster', { vpc, ...getClusterVersionConfig(stack, eks.KubernetesVersion.V1_29), defaultCapacity: 0, vpcSubnets: [{ subnetType: ec2.SubnetType.PUBLIC }], }); ``` ``` { "updates": { "replaceName": false, "updateVpc": true, "updateAccess": false, "replaceRole": false, "updateVersion": false, "updateEncryption": false, "updateLogging": false } } ``` ``` { clientName: 'EKSClient', commandName: 'UpdateClusterConfigCommand', input: { name: 'Cluster9EE0221C-0b6f58b0698348aea43866b93a62b2c9', resourcesVpcConfig: { subnetIds: [Array], securityGroupIds: [Array] } }, output: { update: { createdAt: 2024-05-08T20:55:00.013Z, errors: [], id: '7d5cd243-5536-3f52-b5ca-4c6e6c044529', params: [Array], status: 'InProgress', type: 'VpcConfigUpdate' } }, metadata: {} } ``` ### Checklist - [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md) ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
Describe the bug
Adding a custom
securityGroup
toeks.Cluster
causes a full replacement of the stateful k8s cluster and data loss.Expected Behavior
securityGroup
would be updated in place as it is through the console, SDK, and as indicated incdk diff
.Current Behavior
The custom resource quietly created a new, empty EKS cluster and then deleted the existing cluster.
Reproduction Steps
Should be reproducible by
eks.Cluster
with nosecurityGroup
securityGroup
to the clusterPossible Solution
Looking at the
onEventHandler
logs for the custom resource, it looks like the issue is that changing the SG caused this update setLooks like the check for
replaceVpc
is very rough and way too trigger happyhttps://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/custom-resource-handlers/lib/aws-eks/cluster-resource-handler/cluster.ts#L330
In fact it looks like the API can replace subnets and security groups in place https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateClusterConfig.html
The
eks.Cluster
resource takes avpc
prop, I think the changing of that prop should be the only reason thatreplaceVpc
needs to be true...Additional Information/Context
No response
CDK CLI Version
2.113.0 (build ccd534a)
Framework Version
No response
Node.js Version
v18.17.1
OS
Windows 11
Language
TypeScript
Language Version
4.9.3
Other information
#25544 may have aided disaster recovery here
I would like to have a Stack Policy on this stack but since it's implemented as a custom resource, cloudformation can only tell me that the CR itself isn't going to be replaced, but can't stop / inform me that the CR is going to be destructive.
The text was updated successfully, but these errors were encountered: