Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent ManagedIdentityCredential authentication failure #13894

Closed
2 tasks done
ewh opened this issue Feb 21, 2021 · 11 comments · Fixed by #13919
Closed
2 tasks done

Intermittent ManagedIdentityCredential authentication failure #13894

ewh opened this issue Feb 21, 2021 · 11 comments · Fixed by #13919
Assignees
Labels
Azure.Identity Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@ewh
Copy link

ewh commented Feb 21, 2021

  • Package Name: @azure/identity
  • Package Version: 1.2.3
  • Package Name: @azure/service-bus
  • Package Version: 7.0.3
  • Operating system: Ubuntu 20.04.2. LTS
  • nodejs
    • version: v12.18.2
  • typescript
    • version: 4.1.5

Describe the bug
(I do not know that this is an actual bug.) I frequently see 'ManagedIdentityCredential authentication failures' when using DefaultAzureCredential on my laptop. This behavior is highly intermittent and I don't know how to debug it.

To Reproduce
Steps to reproduce the behavior:

  1. Login with az login.
  2. Use DefaultAzureCredential to try to establish an Azure auth identity.
  3. Use this identity credential to establish a ServiceBusClient and a ServiceBusReceiver. Then do a subscribe on this receiver.
  4. Then, maybe half of the time, I get the following error:
ManagedIdentityCredential authentication failed.(status code undefined).
More details:
request to http://169.254.169.254/metadata/identity/oauth2/token?resource=https%3A%2F%2Fvault.azure.net&api-version=2018-02-01 failed, reason: connect EHOSTUNREACH 169.254.169.254:80

Comments

  1. I don't know why it's trying to connect to http://169.254.169.254 from my laptop!
  2. I don't know why this is intermittent. If I run the same code multiple times, I can't determine a pattern or factor why it fails or succeeds.
  3. I don't know how to get any more debugging information than the error message from above.
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 21, 2021
@ramya-rao-a ramya-rao-a added Azure.Identity Client This issue points to a problem in the data-plane of the library. labels Feb 22, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Feb 22, 2021
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 22, 2021
@sadasant
Copy link
Contributor

Hello @ewh! I'm Daniel. I'll be doing my best to help you.

Thank you for trying our libraries! We're working on improving the experience of DefaultAzureCredential. An update will be available in a couple of months. In the mean time, let's see what's going on.

DefaultAzureCredential is really a mechanism for us to try several different credentials to see if we can authenticate with no further user input. It goes through several credentials. It eventually reaches ManagedIdentityCredential which tries to reach out to several authentication endpoints that are common in the Azure environments (think of Virtual Machines and containers hosted in the Azure cloud). This credential should be failing silently if it was unable unable to reach out to the authentication host. Seems like we missed this network error! We've only been addressing ENETUNREACH, and I'm seeing EHOSTUNREACH in what you're sharing with us (they're similar, but not equal).

I will make a pull request fixing this on our side. We'll be discussing when to release this.

If you always use az login to authenticate, we recommend using the AzureCliCredential, which will only try to log in through the CLI credentials, skip any other form of authentication and give you more relevant errors in case they happen.

How does this sound?

Please let us know how else we can help 👍

@ewh
Copy link
Author

ewh commented Feb 22, 2021

Hi @sadasant. Thanks for the response.

I tried an experiment using AzureCliCredential. I think I still saw the same '169.254.169.254' host unreachable errors coming from ManagedIdentityCredential. Does AzureCliCredential also use ManagedIdentityCredential "underneath"?

(In production we do not use az login. We use DefaultAzureCredential to allow our App Service to automatically authenticate with Azure.)

Also, regarding "We'll be discussing when to release this.", do you know very roughly (days, weeks, months, years, centuries) when this PR might get released to NPM? Thanks!

@sadasant
Copy link
Contributor

@ewh Hello again!

do you know very roughly (days, weeks, months, years, centuries) when this PR might get released to NPM?

Yes! We'll be releasing a preview version on the first week of March and a GA with many updates on the first week of April.

Does AzureCliCredential also use ManagedIdentityCredential "underneath"?

No, AzureCliCredential doesn't use ManagedIdentityCredential at all.

In production we do not use az login. We use DefaultAzureCredential to allow our App Service to automatically authenticate with Azure.)

In the service, ManagedIdentityCredential should work without troubles.

If it helps, you can build your own credential similar to the DefaultAzureCredential by creating a new class extending ChainedTokenCredential, as follows:

import { ChainedTokenCredential, AzureCliCredential, ManagedIdentityCredential } from "@azure/identity";

export class MyCredential extends ChainedTokenCredential {
  constructor() {
    super(new AzureCliCredential(), new ManagedIdentityCredential());
  }
}

Keep in mind that this example is very simple, but something similar can give you a customized experience.

Let me know if this helps! Or if we can help in any other way.

@ewh
Copy link
Author

ewh commented Feb 23, 2021

@sadasant Ok, this is good news. Thanks so much for your quick responsiveness!

I think my own custom ChainedTokenCredential would be a reasonable solution. I have one quick question about this chained-fallback behavior. As you suggested, suppose I have a fallback chain which looks like this:

super(new AzureCliCredential(), new ManagedIdentityCredential());

If I'm running locally, the AzureCliCredential would get tried first and would work. If I'm running in an app service, the AzureCliCredential would fail and should fall through to ManagedIdentityCredential. What is the runtime expense of waiting for the AzureCliCredential to get tried and then fail? Is there any sort of non-neglible timeout period (something on the order of seconds) which would cause delays?

@sadasant
Copy link
Contributor

@ewh

Question for you:
Have you experienced this issue only locally? Or have you also experienced this error (the one you shared in the description of this issue) in your Azure Functions?

Regarding your questions:

What is the runtime expense of waiting for the AzureCliCredential to get tried and then fail? Is there any sort of non-neglible timeout period (something on the order of seconds) which would cause delays?

AzureCliCredential will try to run a command with the Azure CLI and if the command fails it will assume this credential is unavailable until the process is completely stopped and started again. This means that there's an initial cost of NodeJS running az login and handling the error, but this only happens once, the firs time an authentication is attempted.

Looking forward for your response!

@sadasant
Copy link
Contributor

@ewh also! If we provide you an alpha package tomorrow with the fix, would you be able to test it? It will have a weird version but it will come directly from NPM.

@ghost ghost closed this as completed in #13919 Feb 23, 2021
ghost pushed a commit that referenced this issue Feb 23, 2021
This pull request makes the ManagedIdentityCredential:

- Treat unreachable host as we were treating unreachable network.
  - This should have been done before, but we didn't catch it.
  - I couldn't find any other error that seemed relevant to me, using this as reference: [link](https://github.com/nodejs/node/blob/606df7c4e79324b9725bfcfe019a8b75bfa04c3f/deps/uv/src/win/error.c).
- Treat errors with undefined status code as if the credential was unavailable, to avoid breaking the chained token credentials in that case.
- Add tests (turns out that some where being skipped by mistake!)
- Add more comments to improve reading.

This PR:
Fixes #13894
@ewh
Copy link
Author

ewh commented Feb 23, 2021

Hi @sadasant. Your reply responsiveness is AMAZING!

AzureCliCredential will try to run a command with the Azure CLI and if the command fails it will assume this credential is unavailable...
I'm hoping that calling out to Azure CLI and failing would hopefully not be much more expensive than a couple seconds maybe? If it's less than a couple seconds to fail the call to the CLI (and it only happens once per process instance), that should be totally fine.

I am definitely willing to try the alpha package tomorrow. However, I'm not sure I can confirm if the alpha package fixes anything because my underlying issue with DefaultAzureCredential (ManagedIdentityCredential network failure) is intermittent -- because I don't really know what's causing it, I don't really know if the alpha package fixes it. The problem goes from happening almost every time I try to use DefaultAzureCredential to not appearing for a couple of days.

Yes, I think I have only hit this problem locally on my dev machine. DefaultAzureCredential seems to be working fine in our Azure App Services.

@sadasant
Copy link
Contributor

@ewh sorry I closed this issue automatically!

I'll keep it open until we're able to confirm we've definitely fixed this.

Here's the alpha release that has the latest fix: https://www.npmjs.com/package/@azure/identity/v/1.2.4-alpha.20210223.1

Make sure to install it with something like npm install @azure/[email protected]. Make sure to remove the package-lock.json if necessary before installing this version, to make sure NPM doesn't end up using the wrong version.

I understand this is intermittent! But no rush. If it takes a couple of weeks that's fine. If you have the time, please try with this version and let us know if you find it more reliable.

Thank you for your feedback and your time! We'll be looking forward for your response.

@sadasant
Copy link
Contributor

sadasant commented Mar 8, 2021

@ewh hello again. We've released Identity 1.2.4, which includes a fix that should help for your case: https://www.npmjs.com/package/@azure/identity/v/1.2.4

I'll close the issue for the time being! Please let us know if anything else comes up. We're here to help!

@sadasant sadasant closed this as completed Mar 8, 2021
@ewh
Copy link
Author

ewh commented Mar 16, 2021

Hi @sadasant. I finally got around to testing this issue today (testing with the alpha release package you mentioned earlier). It appeared to completely address the issue! Thanks so much for your help with this!

I'll upgrade to 1.2.4 now!

@sadasant
Copy link
Contributor

Thank you for letting us know, @ewh . You’re welcome here anytime!

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Identity Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants