Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown behavior #503

Merged
merged 1 commit into from
Sep 29, 2015
Merged

Shutdown behavior #503

merged 1 commit into from
Sep 29, 2015

Conversation

peterbourgon
Copy link
Contributor

Resolves #424.

@peterbourgon peterbourgon changed the title [WIP] Shutdown [WIP] Shutdown behavior Sep 21, 2015
@peterbourgon peterbourgon changed the title [WIP] Shutdown behavior Shutdown behavior Sep 21, 2015
@2opremio 2opremio self-assigned this Sep 21, 2015
if dockerReporter != nil {
reporters = append(reporters, dockerReporter)
}
if dockerRegistry != nil {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor

Tests I've done:

  1. run ./docker/scope-probe and ctrl-c; I see the probe exit cleanly (bye!)
  2. run it under docker with the -d removed from docker run in the scope script (so it launches in the foregraound), then ctrl-c; I do not see anything exit cleanly
  3. run it under docker normally, send sighup (see http://smarden.org/runit/runsvdir.8.html) to the runsv process; I see it exit cleanly.
  4. run it normally, use docker stop; I do not see it exit cleanly

As it looks like the only way to gracefully shutdown runsvdir is with sighup, we need to make ./scope stop send sighup for docker stop:

diff --git a/scope b/scope
index a973f23..9a724c6 100755
--- a/scope
+++ b/scope
@@ -189,7 +189,7 @@ case "$COMMAND" in

     stop)
         [ $# -eq 0 ] || usage
-        if ! docker stop $SCOPE_CONTAINER_NAME >/dev/null 2>&1 ; then
+        if ! docker kill -s HUP $SCOPE_CONTAINER_NAME >/dev/null 2>&1 ; then
             echo "Weave Scope is not running." >&2
         fi
         docker rm -f $SCOPE_CONTAINER_NAME >/dev/null 2>&1 || true

@@ -32,20 +32,19 @@ func main() {
return
}

defer log.Print("bye!")

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor

I think its probably time we broke the probe main() function up into smaller pieces; perhaps have a probe struct to contain the reporters, taggers & tickers, make add{Reporter, Probe}, loop() and quit() functions on that?

@peterbourgon
Copy link
Contributor Author

I think its probably time we broke the probe main() function up into smaller pieces;

I prototyped that first, and this was cleaner.

@peterbourgon
Copy link
Contributor Author

As it looks like the only way to gracefully shutdown runsvdir is with sighup, we need to make ./scope stop send sighup for docker stop:

I will add this patch to the PR, thanks.

@tomwilkie
Copy link
Contributor

Also need to consider how we add a test to ensure we never break this
again; seem like the kinda of thing it would be easy to regress on and not
notice.

On Tue, Sep 22, 2015 at 3:46 PM, Peter Bourgon [email protected]
wrote:

As it looks like the only way to gracefully shutdown runsvdir is with
sighup, we need to make ./scope stop send sighup for docker stop:

I will add this patch to the PR, thanks.


Reply to this email directly or view it on GitHub
#503 (comment).

@tomwilkie
Copy link
Contributor

Test added; Note when I run it locally I don't see the app or probe shutdown cleanly, so I'm expecting the test to fail.

@2opremio 2opremio removed their assignment Sep 22, 2015
@peterbourgon
Copy link
Contributor Author

>>> Running ./110_shutdown_test.sh on [host1-1382-0.us-central1-a.scope-integration-tests]
>>> Test ./110_shutdown_test.sh finished with success after 3.2 secs

So, LGTM?

@tomwilkie
Copy link
Contributor

I fixed the test - it fails now (probe doesn't exit cleanly).

@peterbourgon
Copy link
Contributor Author

Update. Good news: docker stop and docker kill now reliably terminate the container. Bad news: they don't reliably wait for the probe and app processes to return. Sending SIGHUP to runsvdir causes it to send SIGTERM to all supervised processes (good), but then immediately exit with 111 (bad). Rarely, I will see "probe exiting" or "app exiting" before Docker destroys the container.

Other Docker+runit solutions appear to solve this with their own init script, which explicitly waits for services to exit before returning. I guess that can work, though it feels stupid to wrap a supervisor with a supervisor. Still researching...

@peterbourgon
Copy link
Contributor Author

This definitely works, now:

$ ./scope launch
$ ./scope stop
$ docker logs weavescope 2>&1 | grep 'exiting'
<app> 2015/09/24 12:07:01 app exiting
<probe> 2015/09/24 12:07:01 probe exiting

@tomwilkie any ideas here?

@tomwilkie
Copy link
Contributor

I'll take a look at the test tomorrow to see why its failing.

@tomwilkie
Copy link
Contributor

Hmm when I run it, it works:

vagrant@ubuntu-14:~/src/github.com/weaveworks/scope/integration$ ./110_shutdown_test.sh 
Check scope exits cleanly within 5 seconds
Weave Scope is reachable at the following URL(s):
  * http://10.240.201.151:4040/
host1.us-central1-a.positive-cocoa-90213> a5ee473e10dfae5690c10c4ae7b683227ab02016e66d786540cbc0a84303c8e6
all 3 tests passed in 28.188s.
vagrant@ubuntu-14:~/src/github.com/weaveworks/scope/integration$ ./110_shutdown_test.sh 
Check scope exits cleanly within 5 seconds
Weave Scope is reachable at the following URL(s):
  * http://10.240.201.151:4040/
host1.us-central1-a.positive-cocoa-90213> eacbbb01b4e3411a1a429ad215818dab8f7f87e3b9b6cb1a3bbe9f184e391fb5
all 3 tests passed in 28.526s.
vagrant@ubuntu-14:~/src/github.com/weaveworks/scope/integration$ ./110_shutdown_test.sh 
Check scope exits cleanly within 5 seconds
Weave Scope is reachable at the following URL(s):
  * http://10.240.201.151:4040/
host1.us-central1-a.positive-cocoa-90213> 5476200975d3539bb08fb527545c528b4d8caac858db613e7bba2db18c0aba4f
all 3 tests passed in 28.332s.

@tomwilkie
Copy link
Contributor

Test fixed. I'm blown away by the signal handling in the entrypoint.sh script, and I'm concerned we're leaving ourselves open to bugs like this https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ (which is why we we're exec'ing runit to begin with).

Lets get this change in (so LGTM from me) and then give it some more thought. Perhaps a different supervisor? Perhaps we should merge the app and probe into a single process? That would remove a bunch of code and problems - and we could do it by effectively leaving them as completely separate, but just run in the same process?

@tomwilkie tomwilkie assigned peterbourgon and unassigned tomwilkie Sep 25, 2015
@peterbourgon
Copy link
Contributor Author

I'm blown away by the signal handling in the entrypoint.sh script, and I'm concerned we're leaving ourselves open to bugs like this

You're right. We should do it properly. I wouldn't like to switch to another supervisor. I would propose we adopt the phusion my_init script. What do you think?

edit: it was observed that my_init depends on python3, which we'd like to avoid. Whipping up a quick equivalent to entrypoint.sh in Go, which does proper reaping.

@peterbourgon
Copy link
Contributor Author

@tomwilkie Is this acceptable to you? See peterbourgon/runsvinit.

@tomwilkie
Copy link
Contributor

  • peterbourgon/runsvinit needs test to show it actually reaps orphaned children.
  • I'd prefer it it wasn't included as a binary, but built each time (I'm concern about it bit rotting).
  • Also it needs the usual CI integrations, and should probably live under the weaveworks github account, and needs a thorough code review.

Also, needs a rebase so we can see how this has affected coverage. I've got a feeling its going to have gone down, as probe/main.go is basically untested.

@peterbourgon
Copy link
Contributor Author

peterbourgon/runsvinit needs test to show it actually reaps orphaned children.

OK, will add.

I'd prefer it it wasn't included as a binary, but built each time (I'm concern about it bit rotting).

OK, will change.

Also it needs the usual CI integrations, and should probably live under the weaveworks github account, and needs a thorough code review.

I'll add CI. I'm ambivalent on moving it to Weaveworks; it has nothing to do with our core mission. Once I add the reaping tests, I'll ask you to review it.

Also, needs a rebase

OK, will rebase.

@peterbourgon
Copy link
Contributor Author

  • I've added the test, see here.
  • I've added CI integration, see here.
  • I've rebased this repo.
  • I've manually vendored peterbourgon/runsvinit, see here.

I hope that's enough to earn your LGTM.

@@ -17,6 +17,8 @@ SCOPE_VERSION=$(shell git rev-parse --short HEAD)
DOCKER_VERSION=1.3.1
DOCKER_DISTRIB=docker/docker-$(DOCKER_VERSION).tgz
DOCKER_DISTRIB_URL=https://get.docker.com/builds/Linux/x86_64/docker-$(DOCKER_VERSION).tgz
RUNSVINIT=docker/runsvinit
RUNSVINIT_TGZ=docker/runsvinit-linux-amd64.tgz

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor

Some minor comments. Should also squash this down to 1 changeset. And I think you should get someone like @dpw to review the supervisor.

Other than that, LGTM.

- Restructure main funcs for clean defer-stack-unwinds
- Fix Docker container to handle signals properly
- Introduce runsvinit for container init process
- Integration test
peterbourgon added a commit that referenced this pull request Sep 29, 2015
@peterbourgon peterbourgon merged commit 7d52685 into master Sep 29, 2015
@peterbourgon peterbourgon deleted the shutdown branch September 29, 2015 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants