Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DefaultAzureCredential failed to retrieve a token from the included credentials #19974

Closed
wumingcp opened this issue Mar 31, 2021 · 4 comments
Assignees
Labels
Azure.Identity Client This issue points to a problem in the data-plane of the library. needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue.

Comments

@wumingcp
Copy link

wumingcp commented Mar 31, 2021

Describe the bug
Currently we face an issue on our production, and we use DefaultAzureCredential to get access token by MSI, then use this token to access Azure Blob, but it may fail to get the token for sometimes, not all. Here is the callstack

Unhandled exception. Azure.Identity.CredentialUnavailableException: DefaultAzureCredential failed to retrieve a token from the included credentials.

  • EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
  • ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.
  • SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
  • Visual Studio Token provider can't be accessed at C:\Users\ContainerAdministrator\AppData\Local.IdentityService\AzureServiceAuth\tokenprovider.json
  • Stored credentials not found. Need to authenticate user in VSCode Azure Account.
  • Please run 'az login' to set up account
    ---> System.AggregateException: Multiple exceptions were encountered while attempting to authenticate. (EnvironmentCredential authentication unavailable. Environment variables are not fully configured.) (ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.) (SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.) (Visual Studio Token provider can't be accessed at C:\Users\ContainerAdministrator\AppData\Local.IdentityService\AzureServiceAuth\tokenprovider.json) (Stored credentials not found. Need to authenticate user in VSCode Azure Account.) (Please run 'az login' to set up account)
    ---> Azure.Identity.CredentialUnavailableException: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
    at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception ex)
    at Azure.Identity.EnvironmentCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
    at Azure.Identity.EnvironmentCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
    at Azure.Identity.DefaultAzureCredential.GetTokenFromSourcesAsync(TokenCredential[] sources, TokenRequestContext requestContext, Boolean async, CancellationToken cancellationToken)
    --- End of inner exception stack trace ---
    ---> (Inner Exception hierarchical namespace similar to a file system and CloudBlobContainer class inconsistency #1) Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.
    at Azure.Identity.ManagedIdentityClient.AuthenticateAsync(Boolean async, TokenRequestContext context, CancellationToken cancellationToken)
    at Azure.Identity.ManagedIdentityCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
    at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception ex)
    at Azure.Identity.ManagedIdentityCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
    at Azure.Identity.ManagedIdentityCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
    at Azure.Identity.DefaultAzureCredential.GetTokenFromSourcesAsync(TokenCredential[] sources, TokenRequestContext requestContext, Boolean async, CancellationToken cancellationToken)<---
    ....

Expected behavior
We expect the DefaultAzureCredential can always get token by MSI

Environment:
Our production run as a pod one AKS, and we use windows OS

  • AKS version: 1.19.7
  • Node windows version: aks-2019-datacenter-core-smalldisk-2102: 17763.1757.210220
  • Pod windows version: mcr.microsoft.com/windows/servercore:10.0.17763.1637

Actual behavior (include Exception or Stack Trace)
We also did some investigation to try to figure out the root cause, and we found it caused by TCPClient

  1. Using the following code to repro
Environment.SetEnvironmentVariable("AzureServicesAuthConnectionString", "RunAs=App;AppId={test}");
Environment.SetEnvironmentVariable("AZURE_CLIENT_ID", "{test}");
var blobServiceClient = new BlobServiceClient(new Uri("https://{test}.blob.core.windows.net"), new DefaultAzureCredential());
var blobContainerClient = blobServiceClient.GetBlobContainerClient("test");
var blobClient = blobContainerClient.GetBlobClient("rawlog.txt");
var initialResponse = await blobClient.DownloadAsync();
var rawlogFile = $"rawlog-{Guid.NewGuid()}.txt";
using (var destination = File.Create(rawlogFile))
{
      await initialResponse.Value.Content.CopyToAsync(destination);
}
Console.WriteLine($"Download file to {rawlogFile}");

Using a powershell script to run the code on pod for 100 times

0..100 | ForEach-Object {.\MSITest.exe}

Result: It can repro this issue
PS C:\run> 0..100 | ForEach-Object {.\MSITest.exe}
Download file to rawlog-7040a57e-f4f5-451f-beed-07750e50b2fc.txt
Download file to rawlog-3fd3b9d0-d284-47d0-9318-695671232a2d.txt
....
Download file to rawlog-c17b7789-a03b-4dd8-9baa-ce7d47372859.txt
Download file to rawlog-3be4a0cc-952a-4aa7-9698-2ba8b8278035.txt
Unhandled exception. Azure.Identity.CredentialUnavailableException: DefaultAzureCredential failed to retrieve a token from the included credentials.

  • EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
  • ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.
  • SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
  • Visual Studio Token provider can't be accessed at C:\Users\ContainerAdministrator\AppData\Local.IdentityService\AzureServiceAuth\tokenprovider.json
  • Stored credentials not found. Need to authenticate user in VSCode Azure Account.
  • Please run 'az login' to set up account
  1. We contact Azure support, and get some information from MSI team. According to the logs of MSI service, the app only sends a request to connect, but doesn't send anything to acquire the access token. So we think this is caused by TCP connection, and we check the source code in DefaultAzureCredential

Task connectTask = client.ConnectAsync(s_imdsHostIp, s_imdsPort);

Since it uses TcpClient to try to connect MSI, then we use the following code to test again

using var client = new TcpClient();
var clock = new Stopwatch();
clock.Start();
client.ConnectAsync(IPAddress.Parse("169.254.169.254"), 80).GetAwaiter().GetResult();
clock.Stop();

if (client.Connected)
{
    Console.WriteLine($"Connect to 169.254.169.254, elapsed time {clock.ElapsedMilliseconds}ms");
}
else
{
    Console.WriteLine($"Failed to connect to 169.254.169.254, elapsed time {clock.ElapsedMilliseconds}ms");
}

Using powershell to run this code on pod for 100 times

0..100 | ForEach-Object {.\MSITest.exe}

Result:
PS C:\run> 0..100 | ForEach-Object {.\MSITest.exe}
Connect to 169.254.169.254, elapsed time 29ms
Connect to 169.254.169.254, elapsed time 23ms
Connect to 169.254.169.254, elapsed time 24ms
Connect to 169.254.169.254, elapsed time 23ms
Connect to 169.254.169.254, elapsed time 23ms
...
Connect to 169.254.169.254, elapsed time 3046ms
Connect to 169.254.169.254, elapsed time 23ms
Connect to 169.254.169.254, elapsed time 29ms
Connect to 169.254.169.254, elapsed time 23ms
Connect to 169.254.169.254, elapsed time 24ms
..
Connect to 169.254.169.254, elapsed time 49ms
Connect to 169.254.169.254, elapsed time 24ms
Connect to 169.254.169.254, elapsed time 25ms
Connect to 169.254.169.254, elapsed time 38ms
Connect to 169.254.169.254, elapsed time 3025ms
Connect to 169.254.169.254, elapsed time 24ms
Connect to 169.254.169.254, elapsed time 25ms
Connect to 169.254.169.254, elapsed time 25ms
...

Conclusion:
DefaultAzureCredential use 1s as timeout in souce code and according to the test, TCPClient may use about 3s to get response from MSI. In this case, the DefaultAzureCredential treats the MSI is unavaiable, then cannot get access token

private const int ImdsAvailableTimeoutMs = 1000;
,

  1. We want to use retry to workaround this issue, using following code
public class DefaultAzureCredentialWithRetry : TokenCredential
    {
        private const int IntervalSecondsBetweenRetry = 2;
        private DefaultAzureCredential defaultAzureCredential { get; set; }

        public DefaultAzureCredentialWithRetry()
        {
            defaultAzureCredential = new DefaultAzureCredential();
        }

        public override AccessToken GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken)
        {
            return GetTokenAsync(requestContext, cancellationToken).GetAwaiter().GetResult();
        }

        public override async ValueTask<AccessToken> GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
        {
            return await Policy
                .Handle<CredentialUnavailableException>((ex) => 
                {
                    Console.WriteLine("Get CredentialUnavailableException and re-create a new DefaultAzureCredential");
                    defaultAzureCredential = new DefaultAzureCredential();
                    return true;
                })
                .WaitAndRetryAsync(4, retryAttempt => TimeSpan.FromSeconds(Math.Pow(IntervalSecondsBetweenRetry, retryAttempt)))
                .ExecuteAsync(async () =>
                {
                    return await defaultAzureCredential.GetTokenAsync(requestContext, cancellationToken);
                });
        }
    }

Result:
PS C:\run> 0..100 | ForEach-Object {.\MSITest.exe}
Download file to rawlog-461b232a-1363-46d3-861d-e4037044ae04.txt
Download file to rawlog-e9922d48-1866-4bba-a0b4-1ae6c64a9aae.txt
...
Download file to rawlog-e75064ab-3cb6-49d6-b966-d6c30f4191e5.txt
Get CredentialUnavailableException and re-create a new DefaultAzureCredential
Get CredentialUnavailableException and re-create a new DefaultAzureCredential
Get CredentialUnavailableException and re-create a new DefaultAzureCredential
Get CredentialUnavailableException and re-create a new DefaultAzureCredential
Get CredentialUnavailableException and re-create a new DefaultAzureCredential
Unhandled exception. Azure.Identity.CredentialUnavailableException: DefaultAzureCredential failed to retrieve a token from the included credentials.

  • EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
  • ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found.
  • SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
  • Visual Studio Token provider can't be accessed at C:\Users\ContainerAdministrator\AppData\Local.IdentityService\AzureServiceAuth\tokenprovider.json
  • Stored credentials not found. Need to authenticate user in VSCode Azure Account.
  • Please run 'az login' to set up account

But it doesn't work, because DefaultAzureCredential uses cache for the TCP connection result, always use the first connection result, it doesn't retry to connect MSI again

private readonly AsyncLockWithValue<ManagedIdentitySource> _identitySourceAsyncLock = new AsyncLockWithValue<ManagedIdentitySource>();
, then it

@ghost ghost added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Mar 31, 2021
@yufeih
Copy link

yufeih commented Mar 31, 2021

This bug impacts the reliability of our production service. Unlike a classic web service, where the credentials are retrieved once on startup, we have a bunch of standalone executables that are constantly launching.

It seems like IMDS server does not throttle real HTTP requests, if we keep curling the IMDS endpoint, the response always succeed in about 130ms. As soon as there is a TCP probe made before the actual HTTP request, IMDS server consistently throttles the request once in a while.

The issue occured after switching from AzureServiceTokenProvider to DefaultAzureCredential, because AzureServiceTokenProvider does not do TCP probe. Is there any reason why DefaultAzureCredential must do a TCP probe before making the actual HTTP request?

@jsquire jsquire added Azure.Identity Client This issue points to a problem in the data-plane of the library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Mar 31, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Mar 31, 2021
@jsquire
Copy link
Member

jsquire commented Mar 31, 2021

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@christothes
Copy link
Member

HI @wumingcp

We want to use retry to workaround this issue, using following code
But it doesn't work, because DefaultAzureCredential uses cache for the TCP connection result, always use the first connection result, it doesn't retry to connect MSI again

private readonly AsyncLockWithValue<ManagedIdentitySource> _identitySourceAsyncLock = new AsyncLockWithValue<ManagedIdentitySource>();

I'm not sure why your workaround fails, because the MSI result is only cached with the instance of a DefaultAzureCredential. If you are constructing a new one on each retry, it will try each time. To validate this, you could enable logging to see the result from each IMDS endpoint probe.

Alternatively, if you'd like to gate the initialization of your container on the availability of the endpoint, you could try this pod that waits for a 200 from the endpoint:

@christothes christothes added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Apr 6, 2021
@ghost ghost removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Apr 6, 2021
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Apr 13, 2021
@ghost
Copy link

ghost commented Apr 13, 2021

Hi, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@ghost ghost closed this as completed Apr 28, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2023
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Identity Client This issue points to a problem in the data-plane of the library. needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue.
Projects
None yet
Development

No branches or pull requests

5 participants