Skip to content
This repository has been archived by the owner on Nov 8, 2024. It is now read-only.

Wait for process to be killed before temporary directory gets removed #65

Merged
merged 1 commit into from
Jun 6, 2019

Conversation

honzajavorek
Copy link
Contributor

@honzajavorek honzajavorek commented Jun 5, 2019

No description provided.

});

// remove the temporary directory
return fs.remove(this.dir);
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs.remove() returns a promise, which was previously ignored... the other issues were that:

  • tree-kill didn't kill the processes very well
  • the directory was removed before the process got killed (caused problems with Ruby hooks)

}
pids.forEach((pid) => {
try {
process.kill(pid, 'SIGKILL');
Copy link
Member

@kylef kylef Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGKILL doesn't give the subprocess anytime for their own cleanup / time for gracefully shutting down.

Would it be better to do SIGTERM and then SIGKILL after some timeout (5-10 seconds?) That gives the processes some time to do their own cleanup before force killed.

Maybe in this case if a process doesn't gracefully exit from SIGTERM that should be a test failure too (instead of having to SIGKILL it).

If I understand correctly, these processes are child processes. If you use child_process.spawn to create them, then we can observe when they exit and use the APIs on them to observe when exitted. For example (pseudo code and probably maybe python in areas :)) -- architecture can be changed a lot but this is to show the rough idea:

const assert = require('assert');

const processes = [];
let timeoutTimer;
let completion;

function spawn() {
  const process = ...;

  processes.push(process);

  process.on('exit', () => {
    process.remove(process)

    if (processes.length === 0) {
      clearTimeout(timeoutTimer);
      if (completion) { completion() }
    }
  });
}

function shutdown(done) {
  if (timeoutTimer) {
    assert.fail('shutting down while already shutting down');
  }

  completion = done;
  const timeoutInterval = 2000;
  timeoutTimer = setTimeout(() => {
    // bad processes didn't exit in time
    processes.forEach(process => process.kill('SIGKILL'));
  }, timeoutInterval);

  processes.forEach(process => process.kill());
}

Can probably come up with something better if you can use promise.. Each process lifecycle can be wrapped in promise, kill can return collection of promises and then if timeout happens first they get force-killed.

Copy link
Contributor Author

@honzajavorek honzajavorek Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep this as minimal as possible - it is an after-test tear down cleanup. I know SIGKILL is brutal, but I'd say tests tear down is allowed to use brutal means to cleanup after tests on a few lines of code. There are three situations atm when killing can happen:

  1. Dredd malfunction. Dredd isn't able to kill it's children and they get stuck as running processes. I wonder whether this would even track the children down in case Dredd, the parent process, is already finished 🤔 But it's the best I have right now... in any way, that would be malfunction, and those can cause other malfunctions in chain reaction (in this case malfunction of tests cleanup), that's kind of expected.
  2. Dredd itself gets stuck in limbo because of some unexpected malfunction. This should definitely kill it and all its children with no mercy and I think that's fine.
  3. There are some tests running the hooks handler as a standalone process, sending TCP messages its way and inspecting what it returns. Then they rely on this cleanup to tear down the process (in user environment one would Ctrl+C to SIGINT it).

The test suite template could add to the contract that the hooks handler process should be able to handle SIGTERM, but I'd see that as a separate feature, an addition to what this test suite imposes on hooks handlers implementation. I filed #67 to discuss that. Also, now it could make some of the hooks handlers not passing the test suite.

However, I think we could use your comment (and perhaps the additional code you sent me over Slack?) as a suggestion to improve how Dredd's subprocess management works. Currently it lives here: https://github.com/apiaryio/dredd/blob/72220703a644234436c53de42849ab7fa045653a/lib/childProcess.js Sometimes it feels flaky in the CI and it still has bugs, even though it's much more reliable than it was previously (before I revamped it and put into childProcess.js). Because there, in Dredd, that's IMHO the place where care should be taken and where exactly what you write happens - nicely trying SIGTERM first, waiting, then killing with SIGKILL, etc.

@honzajavorek
Copy link
Contributor Author

🙏

@honzajavorek honzajavorek merged commit 3e63d0a into master Jun 6, 2019
@kylef kylef deleted the honzajavorek/fixes branch June 6, 2019 10:42
@ApiaryBot
Copy link
Collaborator

🎉 This PR is included in version 1.1.5 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants