Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECS] [Container image resolution]: Allow feature to be disabled (or make it opt-in) #2393

Closed
jakauppila opened this issue Jul 18, 2024 · 18 comments
Assignees
Labels
ECS Amazon Elastic Container Service Shipped This feature request was delivered.

Comments

@jakauppila
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
It was announced on 7/11/2024 that for any services created or updated after June 25, 2024 within Amazon ECS that container image tags would be resolved to the image digest and will be used going forward to ensure software version consistency.

This change in behavior was not communicated, was not opt-in behavior, or even gated with a new Fargate platform version.

We relied on the previous behavior by pointing application-defined Task Definitions to centralized managed sidecar images that leveraged mutable tags so that when a new version is pushed, any consuming task definitions will immediately start using it without requiring a deployment by hundreds or thousands of applications.

Which service(s) is this request for?
ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We were leveraging the previous ability to point at mutable container image tags to roll-out centrally managed sidecars without action needed by our application developer customers.

Are you currently working around this issue?
To resolve the problem of failing applications, we had to restore the old container images to ECR with the SHA that was previously resolved to; historically we have purged the old when we push the new.

Additional context
What's New: https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-ecs-software-version-consistency-containerized-applications/
Blog Post: https://aws.amazon.com/blogs/containers/announcing-software-version-consistency-for-amazon-ecs-services/
Documentation: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-ecs.html#deployment-container-image-stability

@jakauppila jakauppila added the Proposed Community submitted issue label Jul 18, 2024
@danielferraz-git
Copy link

Hi,

I'd like to emphasize the importance of the requested feature to disable the new ECS image tag resolution behavior. This change has disrupted our deployment strategy, which relies on using the latest tag for blue-green and rolling updates.

The flexibility of using mutable tags allowed us to manage deployments without extra steps. This ECS change has increased our operational overhead, requiring additional deployment steps for every update.

I'd like to request an option to disable this new functionality at the service, cluster, or account level, allowing us to maintain our current deployment process.

@danielferraz-git
Copy link

For now, I believe the suggested workaround should be officially documented: #2402.

@DevAssis
Copy link

Great! It's good to know that.

1 similar comment
@DevAssis
Copy link

Great! It's good to know that.

@jenmlinaws jenmlinaws added the ECS Amazon Elastic Container Service label Jul 29, 2024
@pmcevoy
Copy link

pmcevoy commented Aug 7, 2024

Got caught by this today when one of our tasks needed to restart due to memory overload and then eventually was killed cos restart was unable to download a datadog sidecar image that we were referencing in TaskDefinition by floating tag, but the tag had moved to a new version and the old image had been purged (we host copies of datadog in our own ECR).
I hate this new feature - I'm compentent enough to use unique buildserver assigned tags for containers that count, but when I decide to use a floating tag (eg based on SemVer) I understand that I may have small internal inconsistencies that I accept.
At the very least, allow us to override this new default...

@vibhav-ag
Copy link

Cross-posting the message I posted on issue #2394. Sorry for the late response on this thread- we're aware of the impact this change has had and apologize for the churn this rollout has created. We've been actively working through the set of issues that have been highlighted on this thread and have 2 updates to share: 1/for customers who've been impacted by the lack of ability to see image tag information, we're working on a change that will bring back image tag information in the describe-tasks response, in the same format as was available prior to the release of version consistency (i.e image:tag). An important thing to keep in mind here is that if you run docker ps on the host, you will see the image in format image:tag but docker inspect will return image:tag@digest. 2/ We're also working on adding a configuration in the container definition that will allow you to opt-out of digest resolution for specific containers within your task- this should address both customers who want to completely opt out of digest resolution as well as customers who want to disable resolution for specific sidecar containers. I'll be using this issue to share updates on the change to disable digest resolution for specific containers and issue #2394 for updates on the change to bring back image tag information. We're tracking both changes at high-priority.
 
Once again, we regret the churn this change has caused you all. While we still believe version consistency is the right behavior for the vast majority of applications hosted on ECS, we fully acknowledge that we could have done a better job socializing these changes and addressing these issues before, rather than after making the change.

@matdelong
Copy link

Could you please provide an estimate for when this work will be complete? I echo the feelings voiced in #2394 that the "software version consistency" feature wasn't rolled out properly, and should be reverted until this new opt-in process is in place.

@acdha
Copy link

acdha commented Sep 4, 2024

For anyone else who's been suffering downtime thanks to the ECS service regression described in this ticket & #2394, I tried to have support disable it for our accounts but found that did not work: SVC is still pushing services into SERVICE_TASK_START_IMPAIRED if they use things like the Amazon X-Ray, CloudWatch, etc.

I ended up deploying a little bit of EventBridge + Lambda to avoid ECS-triggered downtime. This uses an EventBridge rule to trigger a Lambda for ECR push events on the repositories in question and that Lambda calls ecs:UpdateService for each service using that container to force a new deployment which will resolve the tag to the current digest value. With the various work to manage IAM entities, least-privilege policies, etc. this seems like an unnecessary amount of work simply to get back to the level of reliability which ECS had from its launch until June.

@rafaljanicki
Copy link

I find it incredibly concerning that a forced change that impacts production systems has not been rolled back for 3 months already. Not to mention that this change has been released without any further notice nor is there any workaround (I do not consider "Force new deployment" a workaround as it's good for new deployment, but not the case mentioned over here: #2394 (comment) )

@github-project-automation github-project-automation bot moved this to Researching in containers-roadmap Oct 21, 2024
@vibhav-ag vibhav-ag moved this from Researching to Coming Soon in containers-roadmap Oct 21, 2024
@jenmlinaws jenmlinaws removed the Proposed Community submitted issue label Oct 23, 2024
@rs-garrick
Copy link

I just hit with this feature breaking our deployment of a particular service.

This app is not deployed with --force-new-deployment because of other issues with ECS deploys related to long-running processes that take days to exit. Instead, all nodes are marked DRAINING so that new nodes are created with the updated container image. Because the service revision is never updated with the new sha, the new nodes pull down an old container image.

Oddly, there's no way to update the service revision with the new sha without triggering an actual deploy.

I need to opt-out ASAP please.

@vhadianto
Copy link

This significant change has been forced without much communication, this should have been an opt-in change rather than opt-out or at the very least allowed people to opt-out this new behaviour. We have been waiting a few months and would like to know the plan for remediation.

@vibhav-ag
Copy link

Update 2: you now have the ability to disable consistency for specific containers in your task by configuring the new versionConsistency field for each container in the task definition. Any changes to this property are applied after a deployment. Once again, we regret the churn this change has caused you all.

What’s New Post

@github-project-automation github-project-automation bot moved this from Coming Soon to Shipped in containers-roadmap Nov 19, 2024
@vibhav-ag vibhav-ag added Shipped This feature request was delivered. and removed Coming Soon labels Nov 19, 2024
@rafaljanicki
Copy link

So now I have to go over dozens of task definitions to revert your changes that were enforced on us? Eh, not great

@felicienveldema
Copy link

Within the AWS console when making a task definition revision with JSON, the versionConsistency option is not yet available.
When will I be able to update it?

@jakauppila
Copy link
Author

Could we get an account (or org) level configuration to set the default value of that option? Then users could decide to disable it by default and opt-in instead of forcing everyone to opt-out.

@pallymore
Copy link
Member

Hi @felicienveldema -

We are currently in the process of updating the JSON schema used in the console's editor.
For now, you can safely ignore the warning and submit the updated JSON directly.

Version Consistency can be turned off for each container by setting its value to disabled like so:

{
    "family": "task-def-name",
    "containerDefinitions": [
        {
             "name": "container-name",
             "image": "image-uri",
+            "versionConsistency": "disabled"
        }
    ],
}

The warning in the editor does not prevent the JSON from being submitted to the API. We are actively working on providing spellcheck and auto-complete support for this new field.

Thanks,
Yurui

@pallymore
Copy link
Member

Hi @felicienveldema -

The ECS Console has been updated with the latest schema - you should be able to use the JSON editor language features to configure this field now.

Thanks!

@acdha
Copy link

acdha commented Dec 13, 2024

Update 2: you now have the ability to disable consistency for specific containers in your task by configuring the new versionConsistency field for each container in the task definition. Any changes to this property are applied after a deployment. Once again, we regret the churn this change has caused you all.

Thank you for this - ECS is back to being reliable again, which is a relief after the outages caused by the release of the version consistency feature. However, to echo @jakauppila, it would be useful to have an account-wide way to disable this so we won't have future outages if anyone forgets to disable it in a new task definition.

Since popular AWS services like CloudWatch and X-Ray encourage deployments which software version consistency will turn into outages that is an ever-present risk and there's no harm to disabling it since the version consistency feature doesn't add new capabilities which weren't already available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECS Amazon Elastic Container Service Shipped This feature request was delivered.
Projects
Status: Shipped
Development

No branches or pull requests