-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IRSA Support to Self Managed Clusters #4094
Conversation
9e499a4
to
62fe36e
Compare
/assign |
4ff662e
to
44f63a6
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
eafcd38
to
d9ac6f0
Compare
i'm not quite dead yet |
d9ac6f0
to
ea2869e
Compare
Hi @luthermonson, Please correct me if I am wrong, If the pod-identity-webhook is situated on a worker node, and that node experiences downtime or is rescheduled to another node, the subsequent rescheduling of new pods on the affected node results in both the pods and the pod-identity-webhook being scheduled simultaneously. This simultaneous rescheduling disrupts the injection of values from the pod-identity-webhook into the pods, leading to an absence of the expected injected values. To address this issue, several potential solutions can be considered:
Additionally, it's crucial to address the lack of updates on pod-identity-webhook resources to prevent upgrade issues when introducing new features during capi-aws upgrades. Regularly update the pod-identity-webhook resources to ensure compatibility with the latest capabilities and avoid potential complications during upgrades. Example: Updating the pod-identity-webhook image will not be reconciled |
@luthermonson - feel free to ping me if i can help get this in. |
|
||
// reconcilePodIdentityWebhook generates certs and starts the webhook in the workload cluster | ||
// https://github.com/aws/amazon-eks-pod-identity-webhook | ||
// 1. generate webhook certs via cert-manager in the management cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luthermonson
What is the reasoning behind creating certs resources in the management cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security. You don't want unencrypted traffic to an HTTP webhook endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ehn, i think i know what he's asking... @pavansokkenagaraj capa does not automatically install cert-manager into the downstream cluster. for capi projects the management cluster gets cert-manager automatically and is the only place we can generate them which is why it's done there and moved
@luthermonson - sorry i need to get back to this. |
@luthermonson: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Is there anything an interested observer can do to help get this over the finish line? |
While testing this today, I found the CAPA controller needed these additional permissions: - apiGroups:
- cert-manager.io
resources:
- certificates
- issuers
verbs:
- create
- get
- list
- delete
- apiGroups:
- controlplane.cluster.x-k8s.io
resources:
- kubeadmcontrolplanes
verbs:
- get
- list
- patch
- update |
Also while testing this PR, and upon trying to delete the cluster I found the deletion was getting blocked because CAPA controller was throwing error if the OIDC provider didn't exist / had already been deleted, so I added this to gracefully handle the (copying the error handling pattern seen in other services) |
It will be a push to get this in but lets add this so we are tracking it: /milestone v2.6.0 |
return err | ||
} | ||
|
||
if err := reconcilePodIdentityWebhookComponents(ctx, s.scope.Namespace(), certSecret, remoteClient); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of s.scope.Namespace()
here means that CAPA will attempt to install the pod-identity-webhook into a remote namespace matching whatever namespace the cluster resources were created in on the management cluster
This works for the simplest case of installing everything into default
, but if the cluster resources are created in something else like a namespace called capi-<cluster-name>
then this will fail, because the namespace does not exist in the remote cluster.
IMO - the pod-identity-webhook should be deployed into the kube-system
namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to my fork here that:
- hard codes the installation namespace of pod-identity-webhook to be
kube-system
- fixes the certificates to reflect the new namespace
- fixes an issue that could prevent a successful reconcile if the webhook certificate secret already existed in the remote cluster (from a previous reconcile loop).
- bumps the
pod-identity-webhook
image version to the latest (v0.5.5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was based on the install examples of the podid wh... https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/deploy/deployment-base.yaml#L5 trying to keep the install as similar to upstream as possible which does install it into default. the rig I used to work on made that namespace for you in the workload cluster (we created and pivoted and put everything there) so I never caught that bug,
I'm ok with hardcoding to kube-system
(which is probably the best place for it) but just like to make a case for default
being where the AWS/EKS devs intended self-install users to put it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always thought it was strange their examples deployed to default
. From a permissions point of view, it's easy to protect access to it in kube-system
but often users would have permission in default
.
I note that kOps also hard codes it to kube-system
: https://github.com/kubernetes/kops/blob/41480d95a96edad247ecd777b2790ecda33f6a85/upup/models/cloudup/resources/addons/eks-pod-identity-webhook.addons.k8s.io/k8s-1.16.yaml.template#L69
Pushing back: /milestone v2.7.0 |
@luthermonson this will need rebasing |
@damdo - with agreement from @luthermonson I have continued the work on this PR (including rebase) over in #5109 |
Thanks for the heads up @sl1pm4t |
As #5109 is the new PR: /close |
@richardcase: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it: Adds IRSA (oidc provider/pod-identity-webhook) functionality to an IAAS cluster. This will bring the IAAS clusters inline in functionality with Managed clusters which provide most of this functionality out of the box.
How this was Accomplished:
In CAPA kubeadm already builds in serving certs and since kube 1.21 there are two endpoints in the apiserver which make it an oidc provider, these are
/.well-known/openid-configuration
and/openid/v1/jwks
. One can configure the openid responses through a couple kube apiserver parameters which is managed and ran as a static pod. To accomplish any modifications the preferred method to add additional extra args is actually through kubeadm patches as per a trail of issues which show an api design flaw in the map[string]string used for ExtraArgs.Once the apiserver end points are running they need to be served publicaly on port 443, since CAPA already uses an s3 bucket for ignition configs I repurposed it to host the contents of the two files and configured the API server accordingly. Then all that has to be done is an oidc provider is created in IAM which points to the S3 bucket and we install the pod-identity-webhook provided by AWS.
The EKS implementation only creates a configmap called
boilerplate-oidc-trust-policy
. You are expected to copy the contents of this configmap, edit in your details for the name/namespace of the service account and create a new role with that trust policy while assigning additional roles you with the account to use. There is an example I use to test IRSA which a role will need to be created and given both the trust policy and assigned addtional roles and it will list your s3 buckets confirming the oidc provider and IRSA are functioning.Extra details to look for to understand how this actually works, the pod-identity-webhook adds a mutating webhook for all pod creation. If the pod has a service account and it contains an annotation like
eks.amazonaws.com/role-arn: arn:aws:iam::<accountID>:role/test
it will know to generate an STS token and inject it directly into the pod via Env variables.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3560
Special notes for your reviewer:
Adding a new ReconcileOIDCProvider to the AWSCluster reconciliation loop.
Checklist:
Release note: