`sl-runctl stop` returns success before the supervisor is stopped #137

danielwhite · 2015-06-16T02:37:02Z

The fundamental problem behind issue #136 was the use of stop and start in quick succession, where a running supervisor was seen as a successful start.

It would be easier to script against if the stop CLI command only returns once the master is actually stopped.

To reproduce (assuming a running process):

$ sl-runctl stop && sl-runctl status

If the problem exists, then the supervisor information is returned. For example:

master pid: 28943
worker count: 1
worker id 6: { pid: 28964, uptime: 1445, startTime: 1434419524881 }

If the problem is resolved, then there should be an indication that the master is stopped. For example:

Communication error (connect ECONNREFUSED), check master is listening

A fix didn't seem trivial, so I might not get a chance to fix this in the near future, so suggestions would be welcome.

The text was updated successfully, but these errors were encountered:

rmg · 2015-06-16T16:27:58Z

@sam-github thoughts on enhancements to strong-supervisor?

sam-github · 2015-06-16T17:28:56Z

I feel your pain, synchronous CLI commands are handy for scripting, but its not worth enhancing supervisor, it is increasingly just a component of strong-pm, I don't even know how long the runctl channel will continue to be supported (we're moving to websockets). I suggest you look at http://strong-pm.io!

That said, pm has the same issue of "when should a command return".

Problem is that stop can take a while, a soft-stop can take a few minutes (up to 5: https://github.com/strongloop/strong-cluster-control/blob/master/lib/master.js#L34), with no sign of progress, as it waits for any open connections to that worker to go away, there will be no sign of anything in the CLI... which would call for some kind of explicit option, --block-til-complete, perhaps, to enable this.

For start, its not even clear what start means:

when supervisor is running?
when first worker starts?
when all workers have started?
when all workers have listened on a TCP port?

I'd suggest that the low-friction way to do this is write a loop around slc ctl status, waiting until it's state is the expected state.

If you wanted to modify https://github.com/strongloop/strong-mesh-models/blob/master/bin/sl-meshctl.js to support a REST API poller that could wait for various state, that would be interesting, and possibly easier than writing your own loop, maybe something like: slc ctl --control http://production.example.com await --service car-app --size 3, for example.

danielwhite · 2015-06-17T01:15:02Z

The problem with strong-pm is that it carries a lot of overhead in terms of deployment complexity (i.e. creating another package just to host loopback applications), when all I need to do is host a single loopback application. My deployment must use standard package management (i.e. RPMs). We're deploying to environments that don't necessarily have outbound network access, and installing compilers (i.e. npm+deps) is strongly frowned on from a security point of view.

At least in the meantime, a status check loop will probably suffice.

I'll have to give it some more thought, since it's nice to get these problems fixed at the root.

sam-github · 2015-06-17T20:40:25Z

Do you need an rpm-per-app, or can you put strong-pm in an rpm? The latter should be easy. The former is an interesting use-case, I'd like to know if that's a requirement, because we can address it.

Also, note that pm deals with a lot of deployment complexity: it accepts git pushes, or you can push updates with slc deploy. slc deploy works well with slc build. slc build can pack up an app and its deps into either an npm tarball, or a git deploy branch (not master). Those deps can be just javascript, or can be pre-built to include compiled addons. The latter is a good choice for deploying to an environment that doesn't have compilers or outbound network access.

pm also allows remote control and debugging using arc, and supports password-based auth, as well as tunneling its control channel over ssh.

But in summary, it sounds like the two features that would help you are:

synchronous start/stop/restart CLIs: for ease of scripting
slc pm <app.js>: a one-shot run of an app under strong-pm (similar to how slc run works)

Is this it?

danielwhite · 2015-06-18T08:16:13Z

I think you're right probably right. Packaging and deploying strong-pm would be relatively easy. The missing piece for me would then be how to add the application to pm without having a .tgz file. To keep things discoverable, the RPM really needs to own the files.

One-shot runs might also work, but perhaps less of a requirement if there were more ways to add applications to the process manager.

sam-github · 2015-06-19T20:44:04Z

If you want the app and its runner packaged together, you'd need something like slc pm server/server.js - that's what I mean by one-shot.

You have a heavily rpm-based deploy, you need the app to be packaged in the rpm, using slc depoy is not an option?

rmg added the enhancement label Jun 16, 2015

sam-github self-assigned this Jul 22, 2015

sam-github added the #community label Jul 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sl-runctl stop` returns success before the supervisor is stopped #137

`sl-runctl stop` returns success before the supervisor is stopped #137

danielwhite commented Jun 16, 2015

rmg commented Jun 16, 2015

sam-github commented Jun 16, 2015

danielwhite commented Jun 17, 2015

sam-github commented Jun 17, 2015

danielwhite commented Jun 18, 2015

sam-github commented Jun 19, 2015

sl-runctl stop returns success before the supervisor is stopped #137

sl-runctl stop returns success before the supervisor is stopped #137

Comments

danielwhite commented Jun 16, 2015

rmg commented Jun 16, 2015

sam-github commented Jun 16, 2015

danielwhite commented Jun 17, 2015

sam-github commented Jun 17, 2015

danielwhite commented Jun 18, 2015

sam-github commented Jun 19, 2015

`sl-runctl stop` returns success before the supervisor is stopped #137

`sl-runctl stop` returns success before the supervisor is stopped #137