-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document best practices for how to manage releases in production #1786
Comments
and service-configs for systemd, that thing will take over. i thought of a command to generate these configs. |
I would like some recommendations on how to deploy config files through docker/compose/swarm. |
Putting the configs into a "config" image that just exposes a volume seems like a reasonable way to do it. I'd be great to hear more about the cases where it doesn't work, either here or in a new issue. |
@dnephin If you go down the road of one single configuration container and use --volumes-from you sacrifice some security (every container sees all configs) but it looks easy to set up and nice: it's immutable and does not use any host fs path. Once you operate outside localhost and do not recreate everything at each run, you start learning the subtleties of the --volumes-from and compose recreation policies: the config image is updated and restarted, but client container still mount their current volumes unless they are recreated as well for independent reasons. This took a while to be noticed and left us with a workaround of deleting the old config container whenever the config changes. Another solution would seem to be avoiding immutability and change data inside the same volumes, running a cp or similar. At this point it would just be easier to pull the configs from a git repo and skip the image build altogether... which was the stateful solution I originally had in mind. If you want no fixed host path, you need a config data-only container and a config copier. I am not 100% happy with any of the solutions. Either I am not seeing a better way or maybe some feature is still missing (like some crazy image multiple inheritance or some smarter detection of dependencies when using --volumes-from that I can't figure out). |
I have a suggested solution for supporting the zero-downtime deployment a lot of us want. Why not simply add a new option to docker-compose.yml like "zero_downtime:" that would work as follows: web: I run separate containers for nginx, web(rails), postgres and cache(memcached). However, it's the application code in the web container that changes and the only one I need zero downtime on. $ docker-compose up -d web During "up" processing that creates the new "web" container, if the zero_downtime option is specified, start up the new container first exactly like scale web=2 would. Then stop and remove sbgc_web_1 like it currently does. Then rename sbgc_web_2 to sbgc_web_1. If a delay was specified (as in the 50 milliseconds example above) it would delay 50 milliseconds to give the new container time to come up before stopping the old one. If there were 10 web containers already running it would start from the end and work backwards. This is how I do zero downtime deploys today. Clunky but works: [updated] Update: we need a way to rename the sbgc_web_2 container to sbgc_web_1. Thought we could just use 'docker rename sbgc_web_2 sbgc_web_1' which works but then running 'docker-compose scale web=2' will produce sbgc_web_3 instead of sbgc_web_2 as expected. |
What happens to links if you do that ? I guess you need a load balancer container linked to the ones you launch and remove and you can't restart it (?) |
The links between containers are fine in the scenario above. Adding a load balancer in front would work but seems like overkill if we just need to replace a running web container with a new version. I can accomplish that manually by scaling up and stopping the old container but it leaves the new container numbered at 2. If the internals of docker-compose were changed to accommodate starting the new one first, stopping the old one and renumbering the new one I think this would be a pretty good solution. |
In a real use case you want to wait for the second (newer) service to be ready before considering it healthy. This may include connecting to dbs, performing stuff. It's very application specific. Then you want to wait for connection draining on the older copy before closing it. Again connection draining and timeouts is application specific too. It could be a bit overkill to add support for all of that to docker-compose. |
Right, the 2nd container would need time to start up which could take some time depending on the application. That is why I proposed adding a delay: Basically my proposal is just to start the new container first, give it time to come up if needed and then stop and remove the old container. This can be accomplished manually with the docker command line today. The only remaining piece would be to rename the new container. Also possible to do manually today except that docker compose doesn't change the internal number of the container. |
Hey folks, I was facing the need for a zero-downtime deployment for a web service today, and tried to take the scaling approach which didn't work well for me before I realized I could do it by extending my app into 2 identical services (named https://github.com/vincetse/docker-compose-zero-downtime-deployment |
It does not work for me. I have added an issue on your repo. |
I just came across this ticket while deciding on whether or not to use compose with flocker and docker swarm, or whether to use ECS for scaling/deployment jobs, using the docker cli only for certain ad-hoc cluster management tasks. I've decided to go with compose to keep things native. I'm not fond of the AWS API, and I think most developers, like me, would rather not mess about with ridiculously nested JSON objects and so on. I then came across DevOps Toolkit by Viktor Farcic, and he uses a pretty elegant solution to implement blue-green deployments with compose and Jenkins (if you guys use Jenkins). It's pretty effective having tested it in staging. Otherwise it would seem @vincetse has a pretty good solution that doesn't involve much complexity. |
a very good implementation of the rolling-upgrade already exists on Rancher |
as now docker swarm will be native, no need haproxy/nginx for load-balancing, and native health check arguments. is there any more simplified solution? |
If anyone wonders why, that's because of a label that docker-compose adds on container:
Sadly, it's not yet possible to update labels on running containers. (also, to save people a bit of time: trying to break docker-compose by using its |
Ok, I managed to automate zero downtime deploy, thanks @prcorcoran for the guidelines. I'll give here a more detailed way about how to perform it, when using nginx.
useful commandsTo find container ids after scaling up, I use:
This can be used to find new container IP and to stop and remove old container.
To find container creation date:
To find new container IP:
a few more considerationsAs mentioned in previous comments, the number in the container name will keep incrementing. It will be eg app_web_1, then app_web_2, then app_web_3, etc. I didn't find that to be a problem (if there's ever a hard limit in this number, a cold restart of the app reset it). I didn't have either to rename containers manually to keep the newest container up, we just have to manually stop the old container. You can't specify port mapping in your docker-compose file, because then you can't have two containers running at the same time (they would try to bind to the same port). Instead, you need to specify the port in nginx upstream configuration, which means you have to decide about it outside of docker-compose configuration. The described method works when you only want a single container per service. That being said, it shouldn't be too hard to just have a look at how many containers are running, scaling to the double of that number, then stop/rm that number of old containers. Obviously, the more services you have to rotate, the more complicated it gets. |
@oelmekki The
|
@oelmekki also, if the
If you have a setup that utilizes |
That's why I explicitly mention not to do it :) In my previous comment:
-- You can't bind those ports on host, but you can bind those ports on containers, which have each their own IP. So the job is to find the IP of the new container and replace the old container IP with it in nginx upstream configuration. If you don't mind reading golang code, you can see an implementation example here. |
@oelmekki oops! That part of your post didn't register in my brain, I guess. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as not stale anymore due to the recent activity. |
This is one of the main focus of https://github.com/docker/app, and considering how long this issue has been opened without any concrete answer, I think it's better to just close it. |
@ndeloof That may be the main focus of docker/app, but there is still no way to do zero downtime or rolling updates, so I think only 1 portion of this bug has been solved by app In fact, the above listed methods don't work now. If the image is updated and |
You're perfectly right, but the purpose of this issue has been to document those deployment practices, and obviously there's no standard way to achieve this, especially considering the various platforms compose can be used for (single engine, swarm, kubernetes, ecs ...) |
Thanks @oelmekki for your insights. It has been very useful and encouraging when there is so few info on rolling updates with docker-compose. I ended up writing the following script
And here's my
And my
Hope it can be useful to some. I wish it would be integrated into |
Thanks a lot, but since you have bound app port to static 4000, it means only 1 container, new or old can bind to it, hence new and old containers can't run simultaneously. I need to test it once since I may be wrong |
App is gone docker/roadmap#209. Still wondering if we'll ever have zero downtime deploys in compose? Especially since the future of swarm is very unclear still docker/roadmap#175. |
When the old container is killed, there is a considerable downtime before nginx is reloaded again. Not sure if this have changed now but trying many times, always the case. Unreachable when the old container is killed and gets back up after a while. |
We currently have some lightweight documentation about how to use Compose in production, but this could do with improvements:
Resources
The text was updated successfully, but these errors were encountered: