Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Azure Keyvault GetSecret API timouts #46370

Closed
ncoussemacq opened this issue Sep 30, 2024 · 5 comments
Closed

[BUG] Azure Keyvault GetSecret API timouts #46370

ncoussemacq opened this issue Sep 30, 2024 · 5 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. KeyVault question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@ncoussemacq
Copy link

ncoussemacq commented Sep 30, 2024

Library name and version

Azure.Security.KeyVault.Secrets 4.6.0

Describe the bug

We are seeing 2 different scenarios with very GetSecret API calls randomly failing after a significant amount of time. This mostly happen when traffic is quite high.

Case 1 : GetSecret API that get cancelled after ~100s and the automated retry is succeeding right after in 23ms delay.
image
(operation Id 9b3dcb3caafc90843ef1b7612e240807)

Case 2 : GetSecret API takes #20s to return 401 error code and the automated retry is succeeding right after.
I understand that the initial 401 is expected because of the authentication flow, but i'm surprised it takes ~20s to respsond.
image

(operation Id 0ac0c16170c7c1c6a7dc6a9a77425753)

This issue seem quite similar to 37420, that is Closed.

Expected behavior

GetSecret call does not fails after long timeout.

Actual behavior

Call to GetSecret randomly fails after many seconds

Reproduction Steps

Here is the source code of the class calling getSecret API.

public class KeyvaultSecretClient : IKeyvaultSecretClient
{
private static readonly ActivitySource ActivitySource = new ActivitySource(typeof(KeyvaultSecretClient).FullName!, "1.0.0");
private const string GetSecretActivityName = $"{nameof(KeyvaultSecretClient)}:{nameof(GetSecretAsync)}";

private static readonly Regex KeyvaultSecretUnauthorizedCharaters = new Regex("[^a-zA-Z0-9]");
private static readonly SecretClientOptions KeyvaultSecretClientOptions = new SecretClientOptions()
{
    Retry =
        {
            Delay= TimeSpan.FromSeconds(2),
            MaxDelay = TimeSpan.FromSeconds(16),
            MaxRetries = 5,
            Mode = RetryMode.Exponential
        }
};

private const string KeyvaultUrlPattern = "https://{0}.vault.azure.net";
private readonly DefaultAzureCredential _azureCredentials;
private readonly ILogger<KeyvaultSecretClient> _logger;

public KeyvaultSecretClient(
    DefaultAzureCredential azureCredentials,
    ILogger<KeyvaultSecretClient> logger)
{

    ArgumentNullException.ThrowIfNull(azureCredentials, nameof(azureCredentials));
    ArgumentNullException.ThrowIfNull(logger, nameof(logger));

    _azureCredentials = azureCredentials;
    _logger = logger;
}

public async Task<string> GetSecretAsync(string keyvaultName, string secretName)
{
    ArgumentNullException.ThrowIfNullOrEmpty(keyvaultName, nameof(keyvaultName));
    ArgumentNullException.ThrowIfNullOrEmpty(secretName, nameof(secretName));

    using var activity = ActivitySource.StartActivity(GetSecretActivityName);
    activity?.AddTag(nameof(keyvaultName), keyvaultName);
    activity?.AddTag(nameof(secretName), secretName);

    var keyvaultClient = GetKeyvaultClient(keyvaultName);

    var cleanedSecretName = CleanSecretName(secretName);
    activity?.AddTag(nameof(cleanedSecretName), cleanedSecretName);

    try
    { 
        var response = await keyvaultClient.GetSecretAsync(cleanedSecretName);

        if (response.Value == null)
        {
            activity?.SetStatus(ActivityStatusCode.Error);
            _logger.LogError($"Secret {secretName} is empty {keyvaultName}");

            throw new KeyvaultSecretNotFoundException($"Secret {cleanedSecretName} not found in keyvault {keyvaultName}");
        }

        activity?.AddTag("secretSize", response.Value.Value.Length);

        return response.Value.Value;
    }
    catch(RequestFailedException ex)
    {
        activity?.SetStatus(ActivityStatusCode.Error);
        _logger.LogError(ex, $"Failed to get secret {cleanedSecretName} from keyvault {keyvaultName}");

        throw new KeyvaultSecretNotFoundException($"Failed to get secret {cleanedSecretName} from keyvault {keyvaultName}", ex);
    }
}

private SecretClient GetKeyvaultClient(string keyvaultName)
{
    var keyvaultUri = string.Format(KeyvaultUrlPattern, keyvaultName);

    var keyvaultClient = new SecretClient(
        new Uri(keyvaultUri), 
        _azureCredentials,
        KeyvaultSecretClientOptions);

    return keyvaultClient;
}

private string CleanSecretName(string secretName)
{
    return KeyvaultSecretUnauthorizedCharaters.Replace(secretName, "-");
}

}

Environment

.net 8.0

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. KeyVault needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Sep 30, 2024
Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@jsquire
Copy link
Member

jsquire commented Sep 30, 2024

Hi @ncoussemacq. Thanks for reaching out and we regret that you're experiencing difficulties. Based on the description and symptoms, the behavior that you're seeing is most likely related to your application or host environment. It sounds very much like you're either seeing continuations for async calls unable to be scheduled in a timely manner or seeing some form of network congestion. It is also possible that the service calls themselves are taking longer than expected.

Unfortunately, this is not something the that maintainers of the Azure SDK can assist with. We would suggest investigating the application patterns for async and the host resources as the first step. If you believe the service calls are potentially the cause, then your best path forward for would be to open an Azure support request and ask the service team to analyze service logs for that time period. If you would prefer not to open a support ticket, you may want to inquire on the Microsoft Q&A site as the service team also monitors that.

@jsquire jsquire added issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Sep 30, 2024
Copy link

Hi @ncoussemacq. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@kevinharing
Copy link

Possibly related to #44817? What version of Azure.Core are you using?

Copy link

Hi @ncoussemacq, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

@github-project-automation github-project-automation bot moved this from Untriaged to Done in Azure SDK for Key Vault Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. KeyVault question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
Archived in project
Development

No branches or pull requests

4 participants