-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provisioning state failed from private ACR with User managed identity #1233
Comments
I'm facing the exact same issue. Incidentally, I've also tried setting the registry on the container app to my managed identity, but this also fails:
(I've blocked out the sub and rg intentionally - those are correctly populated with the expected sub and rg in the console output. |
Thank you for raising this. @nextdarius and @redging-very-well. We've labeled this as Backlog. If this is of high priority please go ahead an raise a support ticket and feel free to mention this issue in your ticket. |
I was trying this out in my lab and noticed the same issue. In my case, I have an Azure Firewall that blocks everything to the internet. I saw that it was blocking traffic to two different FQDNs for login (the normal one and a region specific). After opening that traffic it started working for me. These two was needed for me:
|
I've figured out that you can set the registry if you specify the e.g.
|
Thanks @redging-very-well, I confirm as well that this does the trick! We're using terraform to create the resources and discovered in the meantime that adding the registry block solves it as well. However, it's still very hard to tackle a deployment failure in such case, as from what I experienced, there's no information at all. Also the fact that az cli does not throw an error in case of a failure is not ideal for CI/CD. |
@nextdarius glad that helped! I totally agree - the container app deployment experience isn't great. It would be good if there was a way to wait for a deployment to succeed, as is possible with tools like helm. |
Hi @nextdarius |
Hi @redging-very-well
|
I will give a more friendly error message for command: |
Similar issue here:
Currently my only solution is to destory it and create a new one. Not great. |
I can confirm this sometimes happens, also somhow in conjunction with Terraform. Only reliable solution i found somehow is to destroy and redeploy. Container App is rendered completely unusable and refuses to locate the managed identity in any way. |
For this issue, before step 5, we need to assign the user identity to the registry:
then we can perform an For a more easy way to execute For extension install: |
Hi @FilippTrigub @jonathan-vogel-siemens Thanks! |
I'm managing the app with Terraform 1.9.8 and az cli 2.67.0. I'm fairly certain the problem described above occurs equally when trying to deploy a new revision manually via the UI. The app can't handle deprovisioning of revisions with images, which are not available on the acr. |
What is |
Azure support suggested authenticating to ACR using admin credentials to avoid encountering this bug. |
@Greedygre dont have one at hand, unfortunately. I mean that this bug occurs automatically, if the image of the active revision of the contaienr app has been deleted from the acr. It is of course obvious that the app should crash if the image cannot be pulled. The problem is however that the app becomes locked in the ProvisioningState failed state and cannot be simply redeployed with a new image. |
Hi @FilippTrigub
|
@Greedygre did you make sure the container is scaled to zero before deleting the image from ACR? Then after it is deleted, the revision should try to activate again. Also not sure, but the container apps service might cache images for some time. |
As @jonathan-vogel-siemens points out, you have to scale the app to 0, then delete the underlying image in the ACR, then restart the app. The container will attempt pulling the image, will not be able to, and move into the locked state.
|
For "container app now refuses to deploy new image, cannot be stopped, cannot be modified in any way" But if I update the containerapp with image C, (the main point is the image C should be different with the image that you can get with command This is because if we update a containerapp with a image that didn't update the template (image is always same with the image that you can get with command To make sure we are talking about the same case, may I ask did you try with an image C at that time? Or you always try the same image that you can get with command |
My apps are updated via CICD, so yes, I am fairly certain the image was new. I tried to deploy a new revision with a new image with terraform and manually, without success. I encountered the issue yesterday when updating an old frontend deploy for my prod. The error produced by the workflow was ERROR: Failed to provision revision for container app 'frontend-app-production'. Error details: The following field(s) are either invalid or missing. Field 'template.containers.frontend-app-production.image' is invalid with details: 'Invalid value: "snaacr.azurecr.io/frontend-app:2025-01-05-20-19-57-5fd403db-prod": GET https:: MANIFEST_UNKNOWN: manifest tagged by "2025-01-05-20-19-57-5fd403db-prod" is not found; map[Tag:2025-01-05-20-19-57-5fd403db-prod] This occurred on 16.01.25 at 11:22 AM GMT+1. I am using azure/container-apps-deploy-action@v2 to deploy. Happy to provide you with more details. Please indicate, what you would need. In the meantime I have rewritten our purge scripts so that this doesnt occurr anymore. |
Thanks for your help! About this error, this error is due to we cannot find the image used to update the containerapp in the ACR, I think the image with tag Also I found the error happened before at 2024-12-09, with image tag About this error, please check your task logic about clean the image tag and make sure the image exists when you use it to update the containerapp. About cannot stop the containerapp, this should be an issue, I can repro it, and we are investigating. Thanks! |
This is exactly what happened. Halo to hear it can be reproduced. |
+1, When deploying the app using VS some fields are set and even deleting and recreating the app I can't add User-managed identities and custom domains. Because it's targeting another repo that does not exist. |
According to azure support case we had open, the underlying issue with managed identity authentication to ACR is fixed and these scenarios should no longer occur, no easy way to fix apps that are already stuck in failed state though. |
This issue is a: (mark with an x)
Issue description
I have a private ACR without admin access enabled from which I pull images for my azure container app. I've created a user managed identity for which I granted AcrPull and is assigned to my Azure Container App. I try to update the revision of my container app using AZ CLI (OIDC Login) but I simply receive "provisioningState": "failed" without any additional information. I tried to check in both ContainerAppSystemLogs_CL and ContainerAppConsoleLogs_CL but could not find anything.
As soon as I enable admin access on ACR, then everything works normal and I can see logs (creating new revision, deprovisioning of old one etc.)
Doing this from Portal with the same user managed identity is OK as well.
Steps to reproduce
az containerapp update
with an image from the ACRExpected behavior
A new revision to be created
Actual behavior
Provisioning state failed without any information or logs in any tables.
The text was updated successfully, but these errors were encountered: