Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro azureml run does not work on an AzureML compute instance #45

Closed
j0rd1smit opened this issue Feb 2, 2023 · 5 comments
Closed

kedro azureml run does not work on an AzureML compute instance #45

j0rd1smit opened this issue Feb 2, 2023 · 5 comments

Comments

@j0rd1smit
Copy link
Contributor

I cannot get the getting started guide in the docs to work on an AzureML compute instance.

I have authenticated using: azure login --use-device-code.
When I run the command kedro azureml run I get the following error:

DefaultAzureCredential failed to retrieve a token from the included credentials.                                        
chained.py:100
Attempted credentials:                                                                                                                
EnvironmentCredential: EnvironmentCredential authentication is unavailable. Environment variables are not fully configured.

So it starts to redirect me to an interactive login, but this does not work on a compute instance since the redirection port is not accessible.

@marrrcin
Copy link
Contributor

marrrcin commented Feb 6, 2023

We're falling back to InteractiveCredentials here:

credential = InteractiveBrowserCredential()

We'll be happy to accept a PR that will perform an optional fallback to DeviceCodeCredential https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.devicecodecredential?view=azure-python

As per Microsoft's documentation, our default - which is DefaultAzureCredential (https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) should fallback to the one used with AZ CLI, maybe there's an issue on their side:

The identity it uses depends on the environment. When an access token is needed, it requests one using these identities in turn, stopping when one provides a token:

  1. A service principal configured by environment variables. See EnvironmentCredential for more details.

  2. An Azure managed identity. See ManagedIdentityCredential for more details.

  3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio. If multiple identities are in the cache, then the value of the environment variable AZURE_USERNAME is used to select which identity to use. See SharedTokenCacheCredential for more details.

  4. The user currently signed in to Visual Studio Code.

  5. The identity currently logged in to the Azure CLI.

  6. The identity currently logged in to Azure PowerShell.

@j0rd1smit
Copy link
Contributor Author

Thanks for the response. I might have some time to play around with this later this week. I will then try to see if this fixes the issue if so I will make a PR.

@marrrcin
Copy link
Contributor

marrrcin commented Feb 8, 2023

Looking forward to it! :)

@j0rd1smit
Copy link
Contributor Author

Ok, I did a deep dive into the problem, and I found the issue. AzureML compute instance set by default the MSI_ENDPOINT and MSI_SECRET environment variables for on compute instances. Even if this managed identity has no rights. The problem is that the DefaultAzureCredential will prioritize managed identities over CLI logins. This would be fine if DefaultAzureCredential would validate if it selected strategy would work. Instead, it greedily picks the first one that could work, and if it does not work, it does not try the others. So, in this case, it finds the MSI_ENDPOINT env variable, so it decides to go for the AzureMLCredential when this one fails to obtain to get the token, it no longer checks the other options. So in this code:

 try:
        credential = DefaultAzureCredential()
        # Check if given credential can get token successfully.
        credential.get_token("https://management.azure.com/.default")
    except Exception:
        # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
        credential = InteractiveBrowserCredential()

We will always fall back to the InteractiveBrowserCredential.
Sadly, the InteractiveBrowserCredential also does not work since azure compute instances are headless.
We could replace InteractiveBrowserCredential with DeviceCodeCredential. This works, but an annoying side effect is that we needed re-login every time we submit a job.

So, I believe the best course of account would be to check if you are on an AzureML compute instance, and if this is the case, tell DefaultAzureCredential to ignore the managed identity credential. (ManagedIdentityCredential does this by checking for the presence of the MSI_ENDPOINT environment variable`).

What do you think @marrrcin ? I have create a PR with proposed solution

@j0rd1smit
Copy link
Contributor Author

fixed by: #47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants