-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Azure): add support for workload identity #3111
feat(Azure): add support for workload identity #3111
Conversation
Signed-off-by: Igor Beliakov <[email protected]>
Signed-off-by: Igor Beliakov <[email protected]>
Welcome @weisdd! |
Signed-off-by: Igor Beliakov <[email protected]>
Signed-off-by: Igor Beliakov <[email protected]>
Updated description to add more details and to say that, on one of my AKS clusters, I used the patched build of external-dns with an additional diagnostic message, which would highlight any attempts to refresh an access token, and everything worked well. So, I think we have a robust setup now. |
@seanmalloy @szuecs Please, let me know what is the current estimate on when you could potentially review the PR. - Any timeline is fine for me, it just a matter of settings some expectations. :) Thanks! |
I have read the documentation from an Azure admin point of view and I think the docs looks good. Great job @weisdd |
Signed-off-by: Igor Beliakov <[email protected]>
Updated the PR a bit:
@szuecs @seanmalloy would be great if you could take a look at the PR. Thanks! |
@weisdd: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@phillebaba Well, go-sdk doesn't have native support for workload identity yet, it'll only be added in Beta around December/January. Till that time, this code snippet can potentially be used (I haven't tested whether it has any token renewal issues myself). So, it won't make things easier, at least for now. Based on the recent discussion in Slack, maintainers are busy, and there's kind of pause till a plugins provider is ready, so I'm unlikely to see any reviews in near future. Thus, I guess there's not much sense in maintaining the PR anyway, and by the time yours is merged, there's likely to be a new go-sdk version out with support for Workload Identity. |
I have totally missed that issue, thanks for the link. Yeah let's just hope the release is not just pushed forward further past January in that case. |
Looks like a beta azidentity package has been published with support for workload identity, would be great to get it into external-dns |
Any word on this? The existing pod-identity/msi feature (which the Azure has provided a side car container that intercepts token requests (sorta like pod identity) and forwards them through the federated credentials flow. While it does indeed work, it is meant as a stop gap for use during migration and not as a long term solution (even the azure docs state as much) We've already seen support added to other libraries: cert-manager, keda, external-secrets, akv2k8s (and I'm sure others, thanks at least in part @weisdd!)... There's definitely a general need/want for this throughout the ecosystem. Also note that the instructions for the external-dns azure provider here includes instructions for pod-identity as well as enabling the AKS-native pod-identity "v1" Azure preview feature, which should not be enabled anymore. The docs are pushing people towards technologies that are deprecated and/or don't/won't soon exist!! |
I'll update the PR once #3040 is merged. |
docs/tutorials/azure.md
Outdated
|
||
#### Update labels and annotations on ExternalDNS service account | ||
|
||
To instruct Workload Identity webhook to inject a projected token into the ExternalDNS pod, the service account needs to have a label `azure.workload.identity/use: "true"` and an annotation `azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weisdd - Went through the tutorial to test out building AKS cluster with workload identity and getting external DNS fully working only using managed identity. Thanks so much for the hard work on this and really hoping we can get this merged in the near future.
There was one step that was missing which is adding the necessary annotations to the pod (not just the service account). There were a few errors that keep showing up about the token missing from the azure.json file, so after some research found this portion in the docs that should be included in this tutorial for others that are trying to enable workload identity for their AKS clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod label was not a requirement during the original private preview. It was announced/communicated it was going to be required for GA, and was then subsequently rolled out to AKS in the next update (still during preview!). Point being, it wasn't a requirement when those instructions were written 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slickmode Thanks for the review!
Indeed, the tutorial was written before v1.0.0-beta.0 was released last month, which introduced the need for the extra label to limit the number of pods intercepted by a webhook (through an object selector) and enforce failurePolicy: Fail
. As far as I understand, the design for annotations and labels is finalized now, there's already v1.0.0-rc.0 release, v1.0.0 is expected in a few weeks time (hopefully, it won't get delayed).
I'd be happy to update my PR to include the label in the guide, just wasn't sure if there's any chance for the PR to be reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slickmode I've just pushed the updated guide to make sure we won't forget it whenever maintainers get to review the PR.
@pinkfloydx33 thanks for your comments here and in other places, they're always helpful :)
…kload identity v1.0.0-beta.0 and newer Signed-off-by: Igor Beliakov <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: weisdd The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
```bash | ||
$ kubectl patch serviceaccount external-dns --namespace "default" --patch \ | ||
"{\"metadata\": {\"annotations\": {\"azure.workload.identity/client-id\": \"${IDENTITY_CLIENT_ID}\"}}}" | ||
$ kubectl patch deployment external-dns --namespace "default" --patch \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should be part of the deployment. https://github.com/kubernetes-sigs/external-dns/blob/master/charts/external-dns/templates/deployment.yaml
Add the client_id to the values.yaml and if .Values.provider==azure (provider
already exists in the values.yaml) and if the client_id for workload identity was set, add the label to the deployment plus add the annotation to the service account
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebader Many months ago, when I was working on this section, I just tried to make it look similar to other sections of the guide (to keep the guide consistent). When looking at the official external-dns chart, it doesn't seem to have much focus on offering something special for the providers external-dns has integrations with. So, I'm not sure if the changes you suggested can get accepted by maintainers. Though, we can simply rely on podLabels
and serviceAccount.annotations
of the existing chart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards sebader in that I don't think patching should be encouraged but I think I lean a bit towards weisdd with that podLabels, podAnnotations and serviceAccount.labels is the way to go and explicitly setting them. Having logic for automatically setting them feels fragile since Microsoft keeps changing the way things behave.
…is GA now Signed-off-by: Igor Beliakov <[email protected]>
Do you have any updates or ETA ? |
Hi folks, any chance it will be completed ? What blocks it? |
Well, in the beginning, maintainers were simply overloaded with requests (e.g. too many provider integrations to maintain => they mentioned the need to develop a new plugin system), and, then, I guess, the PR just got covered with dust :) The code itself works just fine. The same workaround for token renewal has been accepted by cert-manager project last year, and I haven't seen any issues there. Now, as #3040 seems to be getting closer to be merged, my PR will need to be rewritten using the updated SDK. That should be a trivial change as the SDK supports token renewals out of the box. |
@nbjohnson that's great news :) I've just rewritten the code, and currently preparing a cluster to test my new build. |
@weisdd Great to hear, thanks for all your work on the workload identity support. Then we just have to bring this to the maintainer's attention |
@nbjohnson Thanks! |
Hey everyone, I decided not to update this PR to make sure the I've just opened another PR #3906 that implements Workload Identity using |
P.S. For those who wants to test the new build, you can find it here: export IMAGE=quay.io/weisdd/external-dns
export VERSION=v0.13.5-wi0
make build.push/multiarch NOTE: ko has to be installed first. |
Description
Context
In Release 2022-10-10, AKS has brought support for Workload Identity Federation which will eventually replace Pod-Managed Identities.
This is an important change, because it resolves several limitations of Pod-Managed Identities described here.
As I see, there is a PR #2885 that tried to add the functionality, though it haven't got any traction since July. Plus, it doesn't tackle federated token renewal. Thus, I decided to come up with a different implementation to see if it finds any support.
Implementation details
adal package has a function called
NewServicePrincipalTokenFromFederatedToken
which packs a federated token into a struct&ServicePrincipalFederatedSecret{jwt: jwt,}
and passes it toNewServicePrincipalTokenWithSecret
throughsecret
field.After that, we get an instance of
*adal.ServicePrincipalToken
. Unfortunately, it doesn't expose any methods to replacejwt
insecret
. So, suddenly, we have the following issue:By default, the federated token is valid only for 1h, whereas the AAD token that is received in exchange for federated - for 24h (docs). Thus, after roughly 24h, the implicitly called
.Refresh()
method for*adal.ServicePrincipalToken
would start producing errors due to stalejwt
.adal
doesn't seem to offer any native support for tracking changes in the file with federated token. The only sane method that I was able to find by looking through the library was taking advantage ofcustomRefreshFunc
, which is supposed to return a fresh access token or an error if any. To avoid having to reinvent the wheel, I wanted to reuse as many existing methods as possible.The final logic is the following: whenever an access token is about to expire (or just empty right after an instance of
*adal.ServicePrincipalToken
is generated), the library would automatically call a wrapper function that would read the federated token from disk and use it to obtain a new access token by invoking the standard process through explicit call of.Refresh()
method.On one of my AKS clusters, I used the patched build of external-dns with an additional diagnostic message, which would highlight any attempts to refresh an access token, and everything worked well. So, I think we have a robust setup now.
The updated user documentation is based on a similar section that was written for Pod-Managed Identities. It might still be a bit rough, I could easily fix it in case of any issues.
Fixes #2724
Extra information
The adal package that is currently used in Azure provider is scheduled to be deprecated by March 31 2023, so we'll eventually need to migrate away from it. Though that's a different question.
In case you're curious how the same is implemented in the official sidecar proxy, you can have a look at the code here.
Checklist