-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authentication: Support Azure Federated Identity environment variables directly #23965
Conversation
Thanks for taking a look at this @manicminer - I've tested this by building the provider locally and transferring it into a pod with a workload identity configured using
Copy the locally-built provider in to the pod:
Then exec'ing into the pod (
All works as expected and self-configures from the Azure-provided environment variables injected via the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mrsheepuk, thanks for suggesting this. I agree that whilst it's relatively straightforward to set up the necessary environment variables based on those set when using ASK Workload Identity, there are valid points to be made about documenting this or potentially supporting it directly as per your proposal.
We've chatted internally about this and since this is now a mature and stable implementation in AKS, this is something we'd like to support. However, we think it's important to mitigate any risk of disrupting existing configurations (whether using workload identity or not, or even OIDC or not), and also to ensure reliable testing in isolation, so that we have confidence in this being stable across future releases.
With that in mind, would you be able to rework this as follows:
- Add a new provider-level boolean configuration option
use_aks_workload_identity
, with a corresponding environment variableARM_USE_AKS_WORKLOAD_IDENTITY
- When this provider property is
true
, we can specifically consume the AKS-native environment variables in the provider's configure function. - In the
providerConfigure()
function, if theclient_id
/client_id_file_path
properties are set and produce different value(s) to that specified in theAZURE_CLIENT_ID
environment variable, we should raise an error and warn the user to either set one of these, or ensure they are set to the same value. This check should happen in thegetClientId()
function. - Perform the same validation for
tenant_id
/AZURE_TENANT_ID
respectively. A newgetTenantId()
function will be needed inprovider.go
for this (based ongetClientId()
). - The
use_aks_workload_identity
feature should fall back gracefully - that is, if it's set to true and the environment variables are not all set, we should make a best effort to configure the provider with the existing provider properties / environment variables.- If the
AZURE_FEDERATED_TOKEN_FILE
env var is not set, we should skip over it. If it is set, but points to an invalid file, or the contents cannot be read, we should raise an error. - If the
AZURE_TENANT_ID
env var is not set, we should use the value oftenant_id
. - If the
AZURE_CLIENT_ID
env var is not set, we should use the value ofclient_id
/client_id_file_path
. - If any of the above env vars are set, appear to be valid, but are different to any of their existing corresponding properties, we should raise an error.
- If the
- For testing, we'd prefer to create a new test function to test this in isolation, for example
TestAccProvider_aksWorkloadIdentity()
.
This approach would on balance be preferable to specifying the environment variables for client_id
, tenant_id
and oidc_token_file_path
in the provider schema, as it means they will only be consumed conditionally when enabled by the practitioner. It will also establish an order of precedence and prevent users from inadvertently supplying the wrong authentication primitives.
Please do let me know what you think, and whether you'd be keen to work on this? Thanks!
Thanks for discussing it and for the feedback @manicminer - I like the approach you have suggested, and most of it sounds reasonably straightforward. I might need some help with the new test function, but let me start from the top and work down your list and see how far I get. I have to admit I was a bit worried about implicitly picking up the wrong tenant ID / client ID if my approach was run in an existing flow that happened to have |
I think I've covered all the points in dc0932d I'm not sure how you typically run the acceptance tests, so I, ahem, borrowed a token file from an AKS cluster into a local file named 'auth' so I could run the new acceptance test:
I also ran it with incorrect values etc to check that it does indeed fail when the values are not correct. I repeated my above manual test inside an AKS pod as well, using the new 'use_aks_workload_identity = true' in my provider.tf instead of use_oidc and it worked perfectly 🎉 I think it could use more documentation, but writing a whole document on how to set up AKS Workload Identity feels a bit out of scope, so not sure how far we'd need to take that? |
@mrsheepuk Thanks for making the changes, and for the typo fixes along the way - that looks great at first pass. I'll play a bit locally and come back with some thoughts on how we might want to test this. We may elect to do the same as with GitHub OIDC and rely on the execution environment being set up out of band, or possibly add some extra tooling to spin up a cluster as needed, deploy to a pod, and run it there. Don't want to delay this unnecessarily but I think it warrants some thought. For the docs, I was thinking it would be fine to add this as a new section in the existing OIDC Guide. If you want to have a go at this, please feel free (what you've written up so far is great) but I'm also happy to help with that prior to merging. |
I tried adding it to the existing OIDC guide but couldn't quite make it fit, so I've pushed a first draft of doing it as a separate doc. Feel free to make changes / amend / delete the whole thing 😆 but it's a start. |
Just noticed the lint failure - will fix up. @manicminer any further thoughts on how we can test this? anything I can do to help with that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrsheepuk Given that we currently do not perform complete automated end-to-end testing for all the authentication methods we support, I'm happy to approve this based on the manual testing that you and I have conducted - I can confirm that this works great in the two setups that I have tested with.
Thanks again for this contribution. This is a great quality-of-life improvement and I appreciate the time taken to document this, as well as tidying up some typos in general. I pushed a couple of wording tweaks but aside from those, this is good to merge!
Testing with client ID annotation:
Ah brilliant, thanks for taking the time to review and test it out @manicminer :) |
<Actions> <action id="f410411e63aff4bb73a81c2aec1d373cf8a903e63b30dee2006b0030d8a94cc8"> <h3>Bump Terraform `azurerm` provider version</h3> <details id="1d9343c012f5434ac9fe8a98135bae3667b399259be16d9b14302ea3bd424a24"> <summary>Update Terraform lock file</summary> <p>"hashicorp/azurerm" updated from "3.84.0" to "3.85.0" in file ".terraform.lock.hcl"</p> <details> <summary>3.85.0</summary> <pre>Changelog retrieved from:
	https://github.com/hashicorp/terraform-provider-azurerm/releases/tag/v3.85.0
FEATURES:

* New Data Source: `azurerm_locations` ([#23324](hashicorp/terraform-provider-azurerm#23324 New Resource: `azurerm_iotcentral_organization` ([#23132](https://github.com/hashicorp/terraform-provider-azurerm/issues/23132))

ENHANCEMENTS:

* provider: support for authenticating using Azure Kubernetes Service Workload Identity ([#23965](hashicorp/terraform-provider-azurerm#23965 dependencies: updating to `v0.65.0` of `github.com/hashicorp/go-azure-helpers` ([#24222](hashicorp/terraform-provider-azurerm#24222 dependencies: updating to `v0.20231214.1220802` of `github.com/hashicorp/go-azure-sdk` ([#24246](hashicorp/terraform-provider-azurerm#24246 dependencies: updating to version `v0.20231214.1160726` of `github.com/hashicorp/go-azure-sdk` ([#24241](hashicorp/terraform-provider-azurerm#24241 dependencies: update `security/automation` to use `hashicorp/go-azure-sdk` ([#24156](hashicorp/terraform-provider-azurerm#24156 `dataprotection`: updating to API Version `2023-05-01` ([#24143](hashicorp/terraform-provider-azurerm#24143 `kusto`: removing the remnants of the old Resource ID Parsers now this uses `hashicorp/go-azure-sdk` ([#24238](hashicorp/terraform-provider-azurerm#24238 Data Source: `azurerm_cognitive_account` - export the `identity` block ([#24214](hashicorp/terraform-provider-azurerm#24214 Data Source: `azurerm_monitor_workspace` - add support for the `default_data_collection_endpoint_id` and `default_data_collection_rule_id` properties ([#24153](hashicorp/terraform-provider-azurerm#24153 Data Source: `azurerm_shared_image_gallery` - add support for the `image_names` property ([#24176](hashicorp/terraform-provider-azurerm#24176 `azurerm_dns_txt_record` - allow up to `4096` characters for the property `record.value` ([#24169](hashicorp/terraform-provider-azurerm#24169 `azurerm_container_app` - support for the `workload_profile_name` property ([#24219](hashicorp/terraform-provider-azurerm#24219 `azurerm_container_app` - suppot for the `init_container` block ([#23955](hashicorp/terraform-provider-azurerm#23955 `azurerm_hpc_cache_blob_nfs_target` - support for the `verification_timer_in_seconds` and `write_back_timer_in_seconds` properties ([#24207](hashicorp/terraform-provider-azurerm#24207 `azurerm_hpc_cache_nfs_target` - support for the `verification_timer_in_seconds` and `write_back_timer_in_seconds` properties ([#24208](hashicorp/terraform-provider-azurerm#24208 `azurerm_linux_web_app` - make `client_secret_setting_name` optional and conflict with `client_secret_certificate_thumbprint` ([#21834](hashicorp/terraform-provider-azurerm#21834 `azurerm_linux_web_app_slot` - make `client_secret_setting_name` optional and conflict with `client_secret_certificate_thumbprint` ([#21834](hashicorp/terraform-provider-azurerm#21834 `azurerm_linux_web_app` - fix a bug in `app_settings` where settings could be lost ([#24221](hashicorp/terraform-provider-azurerm#24221 `azurerm_linux_web_app_slot` - fix a bug in `app_settings` where settings could be lost ([#24221](hashicorp/terraform-provider-azurerm#24221 `azurerm_log_analytics_workspace` - add support for the `immediate_data_purge_on_30_days_enabled` property ([#24015](hashicorp/terraform-provider-azurerm#24015 `azurerm_mssql_server` - support for other identity types for the key vault key ([#24236](hashicorp/terraform-provider-azurerm#24236 `azurerm_machine_learning_datastore_blobstorage` - resource now skips validation when being created ([#24078](hashicorp/terraform-provider-azurerm#24078 `azurerm_machine_learning_datastore_datalake_gen2` - resource now skips validation when being created ([#24078](hashicorp/terraform-provider-azurerm#24078 `azurerm_machine_learning_datastore_fileshare` - resource now skips validation when being created ([#24078](hashicorp/terraform-provider-azurerm#24078 `azurerm_monitor_workspace` - support for the `default_data_collection_endpoint_id` and `default_data_collection_rule_id` properties ([#24153](hashicorp/terraform-provider-azurerm#24153 `azurerm_redis_cache` - support for the `storage_account_subscription_id` property ([#24101](hashicorp/terraform-provider-azurerm#24101 `azurerm_storage_blob` - support for the `source_content` type `Page` ([#24177](hashicorp/terraform-provider-azurerm#24177 `azurerm_web_application_firewall_policy` - support new values to the `rule_group_name` property ([#24194](hashicorp/terraform-provider-azurerm#24194 `azurerm_windows_web_app` - make the `client_secret_setting_name` property optional and conflicts with the `client_secret_certificate_thumbprint` property ([#21834](hashicorp/terraform-provider-azurerm#21834 `azurerm_windows_web_app_slot` - make the `client_secret_setting_name` property optional and conflicts with the `client_secret_certificate_thumbprint` property ([#21834](hashicorp/terraform-provider-azurerm#21834 `azurerm_windows_web_app` - fix a bug in `app_settings` where settings could be lost ([#24221](hashicorp/terraform-provider-azurerm#24221 `azurerm_windows_web_app_slot` - fix a bug in `app_settings` where settings could be lost ([#24221](hashicorp/terraform-provider-azurerm#24221 `azurerm_cognitive_account` - add `ContentSafety` to the `kind` property validation ([#24205](https://github.com/hashicorp/terraform-provider-azurerm/issues/24205))

BUG FIXES:

* provider: fix an authentication issue with Azure Storage when running in Azure China cloud ([#24246](hashicorp/terraform-provider-azurerm#24246 Data Source: `azurerm_role_definition` - fix bug where `role_definition_id` and `scope` were being incorrectly set ([#24211](hashicorp/terraform-provider-azurerm#24211 `azurerm_batch_account` - fix bug where `UserAssigned, SystemAssigned` could be passed to the resource even though it isn't supported ([#24204](hashicorp/terraform-provider-azurerm#24204 `azurerm_batch_pool` - fix bug where `settings_json` and `protected_settings` were not being unmarshaled ([#24075](hashicorp/terraform-provider-azurerm#24075 `azurerm_bot_service_azure_bot` - fix bug where `public_network_access_enabled` was being set as the value for `LuisKey` ([#24164](hashicorp/terraform-provider-azurerm#24164 `azurerm_cognitive_account_customer_managed_key` - `identity_client_id` is no longer passed to the api when it is empty ([#24231](hashicorp/terraform-provider-azurerm#24231 `azurerm_linux_web_app_slot` - error when `service_plan_id` is identical to the parent `service_plan_id` ([#23403](hashicorp/terraform-provider-azurerm#23403 `azurerm_management_group_template_deployment` - fixing a bug where `template_spec_version_id` couldn't be updated ([#24072](hashicorp/terraform-provider-azurerm#24072 `azurerm_pim_active_role_assignment` - fix an importing issue by filtering available role assignments based on the provided `scope` ([#24077](hashicorp/terraform-provider-azurerm#24077 `azurerm_pim_eligible_role_assignment` - fix an importing issue by filtering available role assignments based on the provided `scope` ([#24077](hashicorp/terraform-provider-azurerm#24077 `azurerm_resource_group_template_deployment` - fixing a bug where `template_spec_version_id` couldn't be updated ([#24072](hashicorp/terraform-provider-azurerm#24072 `azurerm_security_center_setting` - fix the casing for the `setting_name` `Sentinel` ([#24210](hashicorp/terraform-provider-azurerm#24210 `azurerm_storage_account` - Fix crash when checking for `routingInputs.PublishInternetEndpoints` and `routingInputs.PublishMicrosoftEndpoints` ([#24228](hashicorp/terraform-provider-azurerm#24228 `azurerm_storage_share_file` - prevent panic when the file specified by `source` is empty ([#24179](hashicorp/terraform-provider-azurerm#24179 `azurerm_subscription_template_deployment` - fixing a bug where `template_spec_version_id` couldn't be updated ([#24072](hashicorp/terraform-provider-azurerm#24072 `azurerm_tenant_template_deployment` - fixing a bug where `template_spec_version_id` couldn't be updated ([#24072](hashicorp/terraform-provider-azurerm#24072 `azurerm_virtual_machine` - prevent a panic by nil checking the first element of `additional_capabilities` ([#24159](hashicorp/terraform-provider-azurerm#24159 `azurerm_windows_web_app_slot` - error when `service_plan_id` is identical to the parent `service_plan_id` ([#23403](https://github.com/hashicorp/terraform-provider-azurerm/issues/23403))


</pre> </details> </details> <a href="https://infra.ci.jenkins.io/job/terraform-jobs/job/azure/job/main/942/">Jenkins pipeline link</a> </action> </Actions> --- <table> <tr> <td width="77"> <img src="https://www.updatecli.io/images/updatecli.png" alt="Updatecli logo" width="50" height="50"> </td> <td> <p> Created automatically by <a href="https://www.updatecli.io/">Updatecli</a> </p> <details><summary>Options:</summary> <br /> <p>Most of Updatecli configuration is done via <a href="https://www.updatecli.io/docs/prologue/quick-start/">its manifest(s)</a>.</p> <ul> <li>If you close this pull request, Updatecli will automatically reopen it, the next time it runs.</li> <li>If you close this pull request and delete the base branch, Updatecli will automatically recreate it, erasing all previous commits made.</li> </ul> <p> Feel free to report any issues at <a href="https://github.com/updatecli/updatecli/issues">github.com/updatecli/updatecli</a>.<br /> If you find this tool useful, do not hesitate to star <a href="https://github.com/updatecli/updatecli/stargazers">our GitHub repository</a> as a sign of appreciation, and/or to tell us directly on our <a href="https://matrix.to/#/#Updatecli_community:gitter.im">chat</a>! </p> </details> </td> </tr> </table> Co-authored-by: Jenkins Infra Bot (updatecli) <[email protected]>
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
While using an Azure AKS Workload Identity to authenticate the provider is supported (as noted in issues #21635 and #18612), it isn't obvious from the documentation that all that is needed is to map the environment variables over (and indeed I didn't find those issues when searching for how to do this and ended up trying many different things before ending up with the working combination).
As these environment variables are standardised by Azure, this PR proposes supporting them natively to automatically detect this configuration and work as expected when executed inside an AKS cluster with the federated identity presented.
With this, a provider that is authenticated and working inside an AKS pod with workload identity configured becomes simply:
... with the federated identity, tenant ID and client ID all auto-populated from the environment.
make pr-check
passing