Provisioning state failed from private ACR with User managed identity #1233

nextdarius · 2024-07-15T11:37:29Z

This issue is a: (mark with an x)

bug report -> please search issues before submitting
documentation issue or request
regression (a behavior that used to work and stopped in a new release)

Issue description

I have a private ACR without admin access enabled from which I pull images for my azure container app. I've created a user managed identity for which I granted AcrPull and is assigned to my Azure Container App. I try to update the revision of my container app using AZ CLI (OIDC Login) but I simply receive "provisioningState": "failed" without any additional information. I tried to check in both ContainerAppSystemLogs_CL and ContainerAppConsoleLogs_CL but could not find anything.

As soon as I enable admin access on ACR, then everything works normal and I can see logs (creating new revision, deprovisioning of old one etc.)

Doing this from Portal with the same user managed identity is OK as well.

Steps to reproduce

Use Az CLI with OIDC authentication
Prepare an ACR without admin access
Prepare a User Identity with AcrPull for previous ACR created
Assign the user identity to the container app
Perform an az containerapp update with an image from the ACR
Receive provisioning state failed

Expected behavior
A new revision to be created

Actual behavior
Provisioning state failed without any information or logs in any tables.

The text was updated successfully, but these errors were encountered:

redging-very-well · 2024-07-16T23:11:30Z

I'm facing the exact same issue.

Incidentally, I've also tried setting the registry on the container app to my managed identity, but this also fails:

az containerapp registry set -n example -g $RG --server $ACR.azurecr.io --identity $ID_NAME
User identity /subscriptions/<subid>/resourcegroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mi-acr-puller is already assigned to containerapp
- Running ..Failed to provision revision for container app 'example'. Error details: The following field(s) are either invalid or missing. Field 'configuration.Registries<acr>.azurecr.io.Identity' is invalid with details: 'Invalid value: "mic-acr-puller": Managed Identity does not exist';..

(I've blocked out the sub and rg intentionally - those are correctly populated with the expected sub and rg in the console output.

simonjj · 2024-07-17T21:22:27Z

Thank you for raising this. @nextdarius and @redging-very-well. We've labeled this as Backlog. If this is of high priority please go ahead an raise a support ticket and feel free to mention this issue in your ticket.

simongottschlag · 2024-07-19T00:04:57Z

I was trying this out in my lab and noticed the same issue. In my case, I have an Azure Firewall that blocks everything to the internet. I saw that it was blocking traffic to two different FQDNs for login (the normal one and a region specific).

After opening that traffic it started working for me.

These two was needed for me:

login.microsoftonline.com
swedencentral.login.microsoft.com

redging-very-well · 2024-08-09T00:59:31Z

I've figured out that you can set the registry if you specify the --identity parameter as a fully qualified ID.

e.g.

FQID=$(az identity show -n ${identityName} -g ${RG} --query id --output tsv)
az containerapp registry set -n example -g $RG --server $ACR.azurecr.io --identity $FQID

nextdarius · 2024-08-13T10:46:55Z

Thanks @redging-very-well, I confirm as well that this does the trick!

We're using terraform to create the resources and discovered in the meantime that adding the registry block solves it as well.

However, it's still very hard to tackle a deployment failure in such case, as from what I experienced, there's no information at all. Also the fact that az cli does not throw an error in case of a failure is not ideal for CI/CD.

redging-very-well · 2024-08-14T02:53:08Z

@nextdarius glad that helped!

I totally agree - the container app deployment experience isn't great. It would be good if there was a way to wait for a deployment to succeed, as is possible with tools like helm.

Greedygre · 2024-08-19T09:31:14Z

Hi @nextdarius
What is the error you got for Perform an az containerapp update with an image from the ACR?
Base on the step process, you didn't execute command to assign user identity to the registry as following:
az containerapp registry set -n example -g $RG --server $ACR.azurecr.io --identity $Identity-resource-id

Greedygre · 2024-08-19T09:33:30Z

Hi @redging-very-well
Need to input resource id for a user-defined identity, did you occur this error when input a user-defined identity's name? Thanks.


az containerapp registry set -h

Command
    az containerapp registry set : Add or update a container registry's details.

Arguments
    --identity          : The managed identity with which to authenticate to the Azure Container
                          Registry (instead of username/password). Use 'system' for a system-defined
                          identity or a resource id for a user-defined identity. The managed
                          identity should have been assigned acrpull permissions on the ACR before
                          deployment (use 'az role assignment create --role acrpull ...').

Greedygre · 2024-08-19T13:02:08Z

I will give a more friendly error message for command: az containerapp registry set when --identity input not system and not a resource id for a user-defined identity.

FilippTrigub · 2024-12-09T18:02:36Z

Similar issue here:

container app deployed with image A
image A was automatically deleted by a task
container app now refuses to deploy new image, cannot be stopped, cannot be modified in any way

Currently my only solution is to destory it and create a new one. Not great.

jonathan-vogel-siemens · 2025-01-13T22:42:12Z

I can confirm this sometimes happens, also somhow in conjunction with Terraform. Only reliable solution i found somehow is to destroy and redeploy. Container App is rendered completely unusable and refuses to locate the managed identity in any way.

Greedygre · 2025-01-14T05:44:39Z

Steps to reproduce

Use Az CLI with OIDC authentication

Prepare an ACR without admin access

Prepare a User Identity with AcrPull for previous ACR created

Assign the user identity to the container app

Perform an az containerapp update with an image from the ACR

Receive provisioning state failed

For this issue, before step 5, we need to assign the user identity to the registry:

az containerapp registry set -n example -g $RG --server $ACR.azurecr.io --identity $Identity-resource-id

then we can perform an az containerapp update with an image from the ACR.

For a more easy way to execute az containerapp registry set and az containerapp update in one command, you can try command with containerapp extension version >= 1.0.0b4:
az containerapp up --image {} --registry-identity {$Identity-resource-id}

For extension install:
az extension add -n containerapp --upgrade

Greedygre · 2025-01-14T05:46:53Z

Similar issue here:

container app deployed with image A

image A was automatically deleted by a task

container app now refuses to deploy new image, cannot be stopped, cannot be modified in any way

Currently my only solution is to destory it and create a new one. Not great.

Hi @FilippTrigub @jonathan-vogel-siemens
Can you show me what is the version you are using? Could you show me the result of executing command az version?
There was an issue about Receive provisioning state failed without any reason, it has been fixed from azure-cli version 2.66.0.

Thanks!

FilippTrigub · 2025-01-14T11:48:19Z

I'm managing the app with Terraform 1.9.8 and az cli 2.67.0.

I'm fairly certain the problem described above occurs equally when trying to deploy a new revision manually via the UI. The app can't handle deprovisioning of revisions with images, which are not available on the acr.

Greedygre · 2025-01-14T12:48:05Z

Similar issue here:

container app deployed with image A

image A was automatically deleted by a task

container app now refuses to deploy new image, cannot be stopped, cannot be modified in any way

Currently my only solution is to destory it and create a new one. Not great.

What is image A was automatically deleted by a task? Do you mean the image A was exists in the ACR but deleted by design? Can you give me your containerapp and the timestamp the error happened and the region you deployed the containerapp? Thanks.

AurimasNav · 2025-01-14T19:35:33Z

Azure support suggested authenticating to ACR using admin credentials to avoid encountering this bug.

FilippTrigub · 2025-01-14T22:44:48Z

@Greedygre dont have one at hand, unfortunately. I mean that this bug occurs automatically, if the image of the active revision of the contaienr app has been deleted from the acr.

It is of course obvious that the app should crash if the image cannot be pulled. The problem is however that the app becomes locked in the ProvisioningState failed state and cannot be simply redeployed with a new image.

Greedygre · 2025-01-15T03:38:11Z

@Greedygre dont have one at hand, unfortunately. I mean that this bug occurs automatically, if the image of the active revision of the contaienr app has been deleted from the acr.

It is of course obvious that the app should crash if the image cannot be pulled. The problem is however that the app becomes locked in the ProvisioningState failed state and cannot be simply redeployed with a new image.

Hi @FilippTrigub
I cannot repro this issue with revision mode Single with following steps.

container app deployed with image A
my step: az containerapp create -n {} -g {} --environment {} --registry-server {my-acr}.azurecr.io --image {my-acr}.azurecr.io/k8se/quickstart:testversion2
image A was automatically deleted by a task
container app now refuses to deploy new image, cannot be stopped, cannot be modified in any way:
my step: az containerapp update -n {} -g {} --image {my-acr}.azurecr.io/k8se/quickstart:latest

Could you tell me more detail about the Containerapp mode? and the environment type the containerapp using?
Even you have deleted the container, can you give me the container name and the environment name and the region and the date you repro this issue, I can search more detail from that to help to repro this issue.
Did your private ACR enable admin access?
Did you use User Identity with AcrPull for previous ACR created?
Thanks!

jonathan-vogel-siemens · 2025-01-15T08:14:36Z

@Greedygre did you make sure the container is scaled to zero before deleting the image from ACR? Then after it is deleted, the revision should try to activate again. Also not sure, but the container apps service might cache images for some time.

FilippTrigub · 2025-01-15T08:47:35Z

@Greedygre

As @jonathan-vogel-siemens points out, you have to scale the app to 0, then delete the underlying image in the ACR, then restart the app. The container will attempt pulling the image, will not be able to, and move into the locked state.

Containerapp is in Single mode, Env type does not matter (happens in consumption and D4).
I really cant, cause I just destroy the whole app every time that happens and it only happens, if I dont deploy daily. Probably happened over the holidays last time, but I do not know, if it was recorded in system logs and if so, how it was recorded.
yes
yes, its still on user identity. Need to update that.

Greedygre · 2025-01-16T15:40:34Z

@Greedygre

As @jonathan-vogel-siemens points out, you have to scale the app to 0, then delete the underlying image in the ACR, then restart the app. The container will attempt pulling the image, will not be able to, and move into the locked state.

Containerapp is in Single mode, Env type does not matter (happens in consumption and D4).

I really cant, cause I just destroy the whole app every time that happens and it only happens, if I dont deploy daily. Probably happened over the holidays last time, but I do not know, if it was recorded in system logs and if so, how it was recorded.

yes

yes, its still on user identity. Need to update that.

Hi @FilippTrigub

For "container app now refuses to deploy new image, cannot be stopped, cannot be modified in any way"
I tried following steps:
1.create a containerapp with ACR image A use user-assigned identity with AcrPull(single mode).
2.wait for containerapp scale to 0
3.delete the image A
4.try to stop the containerapp, got error: The Container App failed to stop.: Failed to stop container app 'xxxx'. Error details: The following field(s) are either invalid or missing. Field 'template.containers.xxxx.image' is invalid with details: 'Invalid value: "xxxx.azurecr.io/k8se3/quickstart:testversion3": GET https:: MANIFEST_UNKNOWN: manifest tagged by "testversion3" is not found; map[Tag:testversion3]';.., this should be an issue, we are investigating.

But if I update the containerapp with image C, (the main point is the image C should be different with the image that you can get with command az containerapp show), it can be updated successfully.

This is because if we update a containerapp with a image that didn't update the template (image is always same with the image that you can get with command az containerapp show), it will not create a new revision, it looks like stuck and nothing happen.

To make sure we are talking about the same case, may I ask did you try with an image C at that time? Or you always try the same image that you can get with command az containerapp show when the issue happen?
What is the behavior of refuses to deploy new image and cannot be modified in any way?
Thank you!

FilippTrigub · 2025-01-16T15:51:30Z

@Greedygre

My apps are updated via CICD, so yes, I am fairly certain the image was new.

I tried to deploy a new revision with a new image with terraform and manually, without success.

I encountered the issue yesterday when updating an old frontend deploy for my prod. The error produced by the workflow was

ERROR: Failed to provision revision for container app 'frontend-app-production'. Error details: The following field(s) are either invalid or missing. Field 'template.containers.frontend-app-production.image' is invalid with details: 'Invalid value: "snaacr.azurecr.io/frontend-app:2025-01-05-20-19-57-5fd403db-prod": GET https:: MANIFEST_UNKNOWN: manifest tagged by "2025-01-05-20-19-57-5fd403db-prod" is not found; map[Tag:2025-01-05-20-19-57-5fd403db-prod]

This occurred on 16.01.25 at 11:22 AM GMT+1.

I am using azure/container-apps-deploy-action@v2 to deploy.

Happy to provide you with more details. Please indicate, what you would need.

In the meantime I have rewritten our purge scripts so that this doesnt occurr anymore.

Greedygre · 2025-01-17T03:24:38Z

@Greedygre

My apps are updated via CICD, so yes, I am fairly certain the image was new.

ERROR: Failed to provision revision for container app 'frontend-app-production'. Error details: The following field(s) are either invalid or missing. Field 'template.containers.frontend-app-production.image' is invalid with details: 'Invalid value: "snaacr.azurecr.io/frontend-app:2025-01-05-20-19-57-5fd403db-prod": GET https:: MANIFEST_UNKNOWN: manifest tagged by "2025-01-05-20-19-57-5fd403db-prod" is not found; map[Tag:2025-01-05-20-19-57-5fd403db-prod]

Hi @FilippTrigub

Thanks for your help!

About this error, this error is due to we cannot find the image used to update the containerapp in the ACR, I think the image with tag 2025-01-05-20-19-57-5fd403db-prod might be deleted at that time. (The error happened time is at 2025-01-15 10:21:53.2176640, the image tag is 10 days ago)

Also I found the error happened before at 2024-12-09, with image tag 2024-11-29-10-15-25-e41cb531-prod, which also 10 days ago. (I guest the image tag was deleted too)

About this error, please check your task logic about clean the image tag and make sure the image exists when you use it to update the containerapp.

About cannot stop the containerapp, this should be an issue, I can repro it, and we are investigating.

Thanks!

FilippTrigub · 2025-01-17T08:40:50Z

This is exactly what happened. Halo to hear it can be reproduced.

arielmoraes · 2025-01-20T16:29:50Z

ERROR: Failed to provision revision for container app 'frontend-app-production'. Error details: The following field(s) are either invalid or missing. Field 'template.containers.frontend-app-production.image' is invalid with details: 'Invalid value: "snaacr.azurecr.io/frontend-app:2025-01-05-20-19-57-5fd403db-prod": GET https:: MANIFEST_UNKNOWN: manifest tagged by "2025-01-05-20-19-57-5fd403db-prod" is not found; map[Tag:2025-01-05-20-19-57-5fd403db-prod]

+1, When deploying the app using VS some fields are set and even deleting and recreating the app I can't add User-managed identities and custom domains. Because it's targeting another repo that does not exist.

AurimasNav · 2025-01-30T06:29:18Z

According to azure support case we had open, the underlying issue with managed identity authentication to ACR is fixed and these scenarios should no longer occur, no easy way to fix apps that are already stuck in failed state though.

microsoft-github-policy-service bot added the Needs: triage 🔍 Pending a first pass to read, tag, and assign label Jul 15, 2024

anthonychu added bug Something isn't working CLI Related to CLI labels Jul 15, 2024

simonjj added Backlog Issue has been validated and logged in our backlog for future work and removed Needs: triage 🔍 Pending a first pass to read, tag, and assign labels Jul 17, 2024

redging-very-well mentioned this issue Aug 14, 2024

Container App - Registry.Identity can't find identity hashicorp/terraform-provider-azurerm#20675

Open

1 task

Greedygre mentioned this issue Aug 20, 2024

[Containerapp] az containerapp registry set: Throw ValidationError if --identity input value is invalid Azure/azure-cli#29738

Closed

3 tasks

andliang mentioned this issue Dec 29, 2024

Intermittent failure with az containerapp registry set and azure/container-apps-deploy-action@v1 #1372

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioning state failed from private ACR with User managed identity #1233

Provisioning state failed from private ACR with User managed identity #1233

nextdarius commented Jul 15, 2024

redging-very-well commented Jul 16, 2024

simonjj commented Jul 17, 2024

simongottschlag commented Jul 19, 2024 •

edited

Loading

redging-very-well commented Aug 9, 2024 •

edited

Loading

nextdarius commented Aug 13, 2024

redging-very-well commented Aug 14, 2024

Greedygre commented Aug 19, 2024

Greedygre commented Aug 19, 2024 •

edited

Loading

Greedygre commented Aug 19, 2024

FilippTrigub commented Dec 9, 2024

jonathan-vogel-siemens commented Jan 13, 2025

Greedygre commented Jan 14, 2025

Steps to reproduce

Greedygre commented Jan 14, 2025 •

edited

Loading

FilippTrigub commented Jan 14, 2025

Greedygre commented Jan 14, 2025 •

edited

Loading

AurimasNav commented Jan 14, 2025 •

edited

Loading

FilippTrigub commented Jan 14, 2025

Greedygre commented Jan 15, 2025 •

edited

Loading

jonathan-vogel-siemens commented Jan 15, 2025

FilippTrigub commented Jan 15, 2025

Greedygre commented Jan 16, 2025

FilippTrigub commented Jan 16, 2025

Greedygre commented Jan 17, 2025

FilippTrigub commented Jan 17, 2025

arielmoraes commented Jan 20, 2025

AurimasNav commented Jan 30, 2025 •

edited

Loading

Provisioning state failed from private ACR with User managed identity #1233

Provisioning state failed from private ACR with User managed identity #1233

Comments

nextdarius commented Jul 15, 2024

This issue is a: (mark with an x)

Issue description

Steps to reproduce

redging-very-well commented Jul 16, 2024

simonjj commented Jul 17, 2024

simongottschlag commented Jul 19, 2024 • edited Loading

redging-very-well commented Aug 9, 2024 • edited Loading

nextdarius commented Aug 13, 2024

redging-very-well commented Aug 14, 2024

Greedygre commented Aug 19, 2024

Greedygre commented Aug 19, 2024 • edited Loading

Greedygre commented Aug 19, 2024

FilippTrigub commented Dec 9, 2024

jonathan-vogel-siemens commented Jan 13, 2025

Greedygre commented Jan 14, 2025

Steps to reproduce

Greedygre commented Jan 14, 2025 • edited Loading

FilippTrigub commented Jan 14, 2025

Greedygre commented Jan 14, 2025 • edited Loading

AurimasNav commented Jan 14, 2025 • edited Loading

FilippTrigub commented Jan 14, 2025

Greedygre commented Jan 15, 2025 • edited Loading

jonathan-vogel-siemens commented Jan 15, 2025

FilippTrigub commented Jan 15, 2025

Greedygre commented Jan 16, 2025

FilippTrigub commented Jan 16, 2025

Greedygre commented Jan 17, 2025

FilippTrigub commented Jan 17, 2025

arielmoraes commented Jan 20, 2025

AurimasNav commented Jan 30, 2025 • edited Loading

simongottschlag commented Jul 19, 2024 •

edited

Loading

redging-very-well commented Aug 9, 2024 •

edited

Loading

Greedygre commented Aug 19, 2024 •

edited

Loading

Greedygre commented Jan 14, 2025 •

edited

Loading

Greedygre commented Jan 14, 2025 •

edited

Loading

AurimasNav commented Jan 14, 2025 •

edited

Loading

Greedygre commented Jan 15, 2025 •

edited

Loading

AurimasNav commented Jan 30, 2025 •

edited

Loading