-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-eks: Constructing a AwsAuth
can completely break a cluster
#28333
Comments
Thanks for the report. Actually every Cluster comes with a built-in AwsAuth and you don't need to create a new one, instead, you should update the existing one like this as described in the doc: cluster.awsAuth.addMastersRole(role); Is there any reason you need to create another one for the existing cluster? |
@pahud, no. For more context: we recently had a complete outage of one of our clusters because we deployed a diff adding a new I totally understand that it's always going to be possible for people to do dangerous things with an IAC tool (we could delete our cluster, for instance), but we are diligent about reviewing all code changes to our CDK app, and nobody at our company recognized that this diff would cause a cluster outage: --- before
+++ after
@@ -14,3 +14,5 @@
nodegroup_name='demo-nodegroup',
instance_types=[aws_ec2.InstanceType('c6a.xlarge')],
)
+
+ aws_eks.AwsAuth(self, "problematic-aws-auth", cluster=cluster) Running
In other words: it's totally ok for CDK to enable us to do dangerous things, but (IMO) those dangerous things need to at a least look dangerous. For example, deleting a cluster is very obviously a dangerous thing, both when you look at the code diff, and the That all said, I still think it would be better for CDK to just ensure that there are exactly 0 or 1 |
@pahud, any update here? |
@jfly Totally agreed with you. Yes, it sounds good to have a PR as the guardrail. Feel free to submit your PR for that. Thanks. |
I just learned that this aws-auth ConfigMap is getting replaced with a new access entity and access policy mechanism: https://aws.amazon.com/blogs/containers/a-deep-dive-into-simplified-amazon-eks-access-management-controls/. We're going to put our effort into that, which should avoid the issue described here. |
Yes, it's related to #28588 |
I just got hit by this issue and lost a full day of engineering for an entire team as we repaired all of the test clusters that this deployed to before we realised it broke everything. Minimum viable fix - Please add a warning in the documentation to say: "Do not instantiate this class directly. Use the ICluster.awsAuth property", until you can prioritize the deprecation of the public constructor. |
Of note - I was using Cdk Blueprints for eks to build the cluster in one stack, and my google search on how I can add awd auth permissions for my lambda in a related stack from cdk got me to the AwsAuth construct, rather than the instance of the class on ICluster. |
Describe the bug
We recently added a new
AwsAuth
construct to our cdk app. We were shocked to find that this broke all our node groups (as the newAwsAuth
construct overwrote the aws-auth ConfigMap from cdk's internally managedAwsAuth
construct).Expected Behavior
I expect CDK to enforce that there is exactly one
AwsAuth
construct per cluster.Current Behavior
As mentioned above, the new
AwsAuth
construct clobbered the aws-auth ConfigMap managed by the cluster's internalAwsAuth
construct.Reproduction Steps
Here's a simple cdk app that declares a cluster, a nodegroup for that cluster, and an explicit
AwsAuth
for that cluster:Note how this produces 2
Custom::AWSCDK-EKS-KubernetesResource
resources. These two resources will happily step on each other's toes (because of this (unfortunately necessary)overwrite: true
).Possible Solution
I have read through #19218, which talks about a similar issue (the aws-auth ConfigMap getting unexpectedly clobbered), but that's all about using multiple tools to manage a cluster. In the scenario described here, we're using exactly 1 CDK app to manage our cluster, and IMO, CDK should protect us against this.
Some ideas for how to fix this:
AwsAuth
for a cluster, check if there's already aAwsAuth
instantiated for that cluster. If there is one already, crash.AwsAuth
construct at all, just keep it as an internal implementation detail of Cluster. As I understand things, it's never something you'd want to use directly when you've declared your cluster in cdk (instead you'd want to interact with the underlyingcluster.awsAuth
attribute as documented here).AwsAuth
constructs in the tree and asserts that they're for different clusters.Additional Information/Context
No response
CDK CLI Version
2.109.0 (build 941dc16)
Framework Version
No response
Node.js Version
v18.15.0
OS
N/A (this happens both on macOS and Linux)
Language
Python
Language Version
Python (3.10.6)
Other information
No response
The text was updated successfully, but these errors were encountered: