Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyvault: populating the cache using both the Key Vault List and Resources API to workaround incomplete/stale data being returned #26070

Merged
merged 3 commits into from
May 27, 2024

Conversation

tombuildsstuff
Copy link
Contributor

@tombuildsstuff tombuildsstuff commented May 23, 2024

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize for review
  • Please do not leave "+1" or "me too" comments, they generate extra noise for PR followers and do not help prioritize for review

Description

Through a significant amount of testing and validation - plus open issues, we've identified an issue when populating the available Key Vaults for the cache, where the Key Vaults List API returns stale data. Having spend a significant amount of time investigating this issue, the API itself appears to only add and remove from the cache - rather than invalidating the cache when something changes - meaning that if there's an issue on the Azure side (or slow replication, etc) the list of available Key Vaults is invalid, and thus Key Vaults which exist are not returned through this API.

Digging into the Azure CLI implementation, the command az keyvault list returns a list of both Key Vaults and Managed HSMS (by default together, but this can be filtered to either type using an argument). Notably however, when loading Key Vaults this is done using an old version of the Resources API (2015-11-01) which is curious - and having spent a decent chunk of time digging into the Azure CLI source code, I can't find a documented reason for this, particularly for an API version this old. By contrast, when obtaining a list of Managed HSMs the Managed HSMs List API is used (that is /subscriptions/XXX/providers/Microsoft.KeyVaults/managedHsms).

Whilst for the life of me I can't find any meaningful context for why the Azure CLI is using this older API, since that's what the Azure CLI is using, it feels reasonable for us to also use that API version (so that users have a consistent experience when querying the available data).

Therefore.. this PR updates the Key Vault caching logic to first query using the Key Vaults List API (that is, /subscriptions/XXX/providers/Microsoft.KeyVaults/vaults) and then to query using the Resources API to populate the list of Key Vaults within the current Subscription. This means we are making an extra API call, but given that both of these APIs (in addition to the Resource Graph API) have known caching issues - I'm honestly not sure what we else we can do here, so this feels like a reasonable compromise.

In my testing this resolves the issues that we've been seeing (now that we can reproduce this) where there's 10 Key Vaults currently in our Subscription, but only 2 are returned when using the Key Vaults List API (even when crossing all pages) - and all 10 are returned via the Resources API - and as such I believe this'll also solve #25548.

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

Related Issue(s)

Fixes #25548

Dependent on #26069

…t and the Resources Endpoint

This commit:

1. Registers a client for Resources@2015-11-01 and KeyVault@2023-07-01
2. Updates the caching to retrieve the list of Key Vaults using both the List API for the ResourceType and using the
   Resources API directly - meaning we have a more up-to-date set of data.

Notably we're using the older Resources API that the Azure CLI uses, more details can be found in hashicorp/pandora#4148
however by loading both we're hoping to alleviate the caching issues we've seen.
Base automatically changed from dependencies/go-azure-sdk to main May 23, 2024 08:31
@tombuildsstuff tombuildsstuff modified the milestones: v3.105.0, v3.106.0 May 23, 2024
Copy link
Contributor

@manicminer manicminer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tombuildsstuff, this LGTM ⚓

@tombuildsstuff tombuildsstuff merged commit bf9e6d4 into main May 27, 2024
29 checks passed
@tombuildsstuff tombuildsstuff deleted the f/key-vault-caching branch May 27, 2024 10:41
tombuildsstuff added a commit that referenced this pull request May 27, 2024
@danielino
Copy link
Contributor

danielino commented Jun 3, 2024

Hello @tombuildsstuff , i might have found a bug within the refresh after apply because an error is raised:

i have a disk encryption set that uses a key from a keyvault named xxxxxxxxxxxx (in the logs).

an error is raised because another keyvault (named zzzzzzzzzzzz in logs) is not found.
i don't know why this vault are returned from the api, because it doesn't exists anymore.

│ Error: validating Key Vault Key "https://xxxxxxxxxxxx.vault.azure.net/keys/xxxxxxxxxxxxxxxx" for Disk Encryption Set: retrieving the Resource ID the Key Vault at URL "https://xxxxxxxxxxxxxxxx.vault.azure.net/": populating the Key Vaults cache for Subscription (Subscription: "my_subscription"): retrieving Key Vault (Subscription: "my_subscription"
│ Resource Group Name: "yyyyyyyyyyyyyyy"
│ Key Vault Name: "zzzzzzzzzzzz"): unexpected status 404 (404 Not Found) with error: ResourceGroupNotFound: Resource group 'OTHER_RG' could not be found.

@aronoeldb

tombuildsstuff added a commit that referenced this pull request Jun 3, 2024
Fixes this comment (#26070 (comment))
by @danielino where the ARM Resources API returns resources which don't exist - which then get a 404 when requesting
them.
@tombuildsstuff
Copy link
Contributor Author

@danielino thanks for reporting this - I've opened #26199 which should resolve that

@danielino
Copy link
Contributor

danielino commented Jun 3, 2024

@danielino thanks for reporting this - I've opened #26199 which should resolve that

@tombuildsstuff you're awesome, thanks!

Copy link

github-actions bot commented Jul 4, 2024

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: Unable to determine the Resource ID for the Key Vault at URL
3 participants