deployment shouldn't use a static temp directory #45

tkellen · 2015-09-22T21:40:41Z

if two deploys are happening simultaneously for some reason, they'll clobber.

cowboy · 2015-09-22T21:58:00Z

But if we change the temp dir name, if two deploys are happening simultaneously, only one will actually deploy.

The real solution is probably to fail if a deployment is in-progress. Just checking for the existence of the temp dir won't be adequate, however, because it could be leftover if a deploy fails partway through.

Ideas?

tkellen · 2015-09-22T21:59:14Z

I think it's okay if two deploys happen at the same time and whichever ends last wins. This relates to the work @brianloveswords is doing right now on http://github.com/brianloveswords/deployer, though, so I'm curious what he thinks. @MattSurabian too!

tkellen · 2015-09-22T22:24:18Z

Also of interest:
https://gist.github.com/brianloveswords/d536a649c071f6d580b8

cowboy · 2015-09-23T13:26:17Z

Do we need to consider that people might be running other playbooks than deploy at the same time, like configure?

tkellen · 2015-09-23T13:53:28Z

Multiple people running configure on a box in our infra seems incredibly unlikely to me. Trying to prevent something like that feels like an upstream concern to me (if we need to prevent concurrent runs period, that should be something supported by Ansible, not us here).

The only thing I want to address in this issue is handling concurrent deploys gracefully so we don't wind up with a site in production that is borked. I will admit the situation seems almost as rare as the configure one, but given how easy it is to mitigate, it seems worthwhile to put in the effort.

A temp directory suffixed with a hash is easy to implement and would mean n number of deploys could be running simultaneously and whichever finished last would be the one that stayed live.

The only (sane) alternative that comes to mind is checking for the existence of fixed temp directory and bailing out, telling the user to force. This is bad though, because a temp directory could easily be left hanging around if a deploy was cut off mid-process, and wouldn't work well in situations where we are automating things (re: @brianloveswords work on deployer).

Finally, in addition to doing this, we should probably address this concern with simple communication. At previous gigs (where lots of people were competent in how to deploy), we all agreed to announce in chat when a deploy was kicked off. Also, the deployment scripts posted messages in chat as it rolled out across our infra.

cowboy · 2015-09-23T14:04:04Z

Sounds like a good idea!

brianloveswords · 2015-09-23T15:08:25Z

deployer is being designed such that each playbook about gets a separate queue and only one instance of a given playbook can run at a time and any requests-to-deploy that happen while a run is in progress gets queued, so the last one in wins ^[1].

Beyond that, deployer doesn't know or care what's in the playbooks, but I believe they should be designed to be a fault-tolerant as possible, even if in the automated world concurrent runs of the same playbook should be disallowed.

1. "Last one" meaning currently "last request-to-deploy message received". In an unlikely edge case where two messages come fairly close to each other and one gets delayed for some reason or needs to retry, the temporally earlier deploy message might end up getting received after the later message. We can mitigate this programmatically by adding a delay to the queue processor and sorting the messages by time sent so we process them in the order-of-intent rather than order-of-receipt. We can also mitigate this socially by telling people to avoid triggering two deploys in quick succession.

cowboy · 2015-09-23T16:21:29Z

I was really surprised when I searched Google for ansible locking I found just one stack overflow thread and nothing official. Ansible doesn't seem to care about this problem.

cowboy · 2015-09-23T16:26:58Z

I'm wondering if there's a way to write a task that would go at the top of each playbook where this mattered to check to see if that playbook is also being run in another process.

Of course, I'm seeing stuff like this, which isn't exactly helpful at first glance.

$ ps aux | grep python
cowboy    7424  0.0  0.0   4440   652 pts/1    Ss+  16:23   0:00 /bin/sh -c sudo -k && sudo -H -S -p "[sudo via ansible, key=lgozyxysjhhceuqljyfxkvansecqpnou] password: " -u root /bin/sh -c 'echo BECOME-SUCCESS-lgozyxysjhhceuqljyfxkvansecqpnou; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/npm; rm -rf /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/ >/dev/null 2>&1'
root      7426  0.0  0.2  63668  2088 pts/1    S+   16:23   0:00 sudo -H -S -p [sudo via ansible, key=lgozyxysjhhceuqljyfxkvansecqpnou] password:  -u root /bin/sh -c echo BECOME-SUCCESS-lgozyxysjhhceuqljyfxkvansecqpnou; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/npm; rm -rf /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/ >/dev/null 2>&1
root      7427  0.0  0.0   4440   656 pts/1    S+   16:23   0:00 /bin/sh -c echo BECOME-SUCCESS-lgozyxysjhhceuqljyfxkvansecqpnou; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/npm; rm -rf /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/ >/dev/null 2>&1
root      7428  0.0  0.8  34052  8888 pts/1    S+   16:23   0:00 /usr/bin/python /home/cowboy/.ansible/tmp/ansible-tmp-1443025429.21-51224197571946/npm
cowboy    7491  0.0  0.0  10464   936 pts/0    S+   16:24   0:00 grep --color=auto python

cowboy · 2015-09-23T16:27:56Z

At the very least, we could have a playbook fail if it detects another ansible process. Thoughts?

jugglinmike · 2015-09-23T16:40:30Z

@cowboy what about touching a file in $HOME/.ansible?

jugglinmike · 2015-09-23T16:41:05Z

D'oh

Just checking for the existence of the temp dir won't be adequate, however, because it could be leftover if a deploy fails partway through.

jugglinmike · 2015-09-23T16:46:35Z

Maybe blocks would help with lock cleanup?

The always section runs no matter what previous error did or did not occur in the block and rescue sections.

But I have no context for this stuff. I'm going to leave you all alone now!

tkellen · 2015-09-23T17:00:22Z

@jugglinmike That also wouldn't work in the case of someone killing the Ansible process midway.

I really think this is a problem for Ansible itself, not for userland to resolve (except in the case of @brianloveswords's work, where a series of commits could kick off a ton of deploys and we can't guarantee that every one will use whatever special hack we put in place).

For the purposes of this issue, I believe our problem would be resolved with the temp-<hash> folder name. Broadly speaking, it would be awesome if someone opened an issue over on the Ansible tracker. I feel sure they've encountered this question before and have some good reason for why things are the way they are.

tkellen · 2015-09-25T17:16:41Z

Just leaving a note to myself, or whoever implements this...

This line:
https://github.com/bocoup/deployment-workflow/blob/master/deploy/ansible/roles/deploy/tasks/checkout.yml#L5-L6
...should not delete all temp directories wholesale, it should only remove ones whose creation date is older than say.... an hour. This will prevent one deploy from clobbering a concurrent's folder, and also keep any that were left lingering by aborted deploys cleaned up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployment shouldn't use a static temp directory #45

deployment shouldn't use a static temp directory #45

tkellen commented Sep 22, 2015

cowboy commented Sep 22, 2015

tkellen commented Sep 22, 2015

tkellen commented Sep 22, 2015

cowboy commented Sep 23, 2015

tkellen commented Sep 23, 2015

cowboy commented Sep 23, 2015

brianloveswords commented Sep 23, 2015

cowboy commented Sep 23, 2015

cowboy commented Sep 23, 2015

cowboy commented Sep 23, 2015

jugglinmike commented Sep 23, 2015

jugglinmike commented Sep 23, 2015

jugglinmike commented Sep 23, 2015

tkellen commented Sep 23, 2015

tkellen commented Sep 25, 2015

deployment shouldn't use a static temp directory #45

deployment shouldn't use a static temp directory #45

Comments

tkellen commented Sep 22, 2015

cowboy commented Sep 22, 2015

tkellen commented Sep 22, 2015

tkellen commented Sep 22, 2015

cowboy commented Sep 23, 2015

tkellen commented Sep 23, 2015

cowboy commented Sep 23, 2015

brianloveswords commented Sep 23, 2015

cowboy commented Sep 23, 2015

cowboy commented Sep 23, 2015

cowboy commented Sep 23, 2015

jugglinmike commented Sep 23, 2015

jugglinmike commented Sep 23, 2015

jugglinmike commented Sep 23, 2015

tkellen commented Sep 23, 2015

tkellen commented Sep 25, 2015