Skip to content
This repository has been archived by the owner on Oct 18, 2022. It is now read-only.

Terminating invest subprocesses | Automated browser tests #50

Merged
merged 82 commits into from
Oct 20, 2020

Conversation

davemfish
Copy link
Collaborator

@davemfish davemfish commented Oct 8, 2020

This PR adds a feature to gracefully terminate an invest run during execution, with a cancel button. It also adds an automated browser test, using puppeteer to drive the browser, to test this functionality on the final built application.

I'm especially interested in review and advice on spawning and terminating subprocesses from node on different operating systems. This comes up in a few places:

  • src/InvestJob.jsx:investExecute method. -- the invest model subprocess
  • src/main_helpers:createPythonFlaskProcess function -- launching the flask app
  • tests/binary_tests/puppet.test.js -- launching the built electron app as a spawned process

I believe the builds should all be successful, but puppet.test.js fails/errors in CI for various reasons:

  • Windows: the test fails because on Windows we resort to force-killing the invest subprocess, which gives the app no time to handle the request to "Cancel Run" and display the proper message.
  • MacOS: the test errors because: Unable to create basic Accelerated OpenGL renderer. Core Image is now using the software OpenGL renderer. This will be slow. I think this only comes up in the context of puppeteer, not when installing and launching the electron app. Would be great if @phargogh or @emlys could try running the app and this puppet test locally.
  • Ubuntu: this was passing, but now seems unreliable, failing for various reasons. In CI, this test requires a virtual frame buffer provided by https://github.com/marketplace/actions/gabrielbb-xvfb-action. I'm not sure if that's a source of unreliability. The test is reliable outside of CI.

This PR also fixes #47, fixes #46, fixes #36

…rgs dicts and datastacks - specifically hidden args and n_workers.
Comment on lines 204 to 212
this.investRun.terminate = () => {
if (this.state.jobStatus === 'running') {
// process.kill(-this.investRun.pid, 'SIGTERM'); // does not kill
// this.investRun.kill(); // does not kill
// This kills, but no chance to handle it and show 'Run Cancelled'
exec(`taskkill /pid ${this.investRun.pid} /t /f`);
// exec(`taskkill /pid ${this.investRun.pid} /t`) // does not kill
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little bit of looking into this and I probably found what you did; a lot of nodejs issues about how process.kill doesn't function correctly on Windows. It also seems like Windows doesn't use Signals in the same way.

The threads I was on:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like only POSIX systems implement signals, but even Windows should handle interrupts of some sort. One SO post I found mentioned Win32 messages, which makes me wonder if there's just an alternate, windows-specific way to handle events. If that's the case, then how would we do this in node? All of the child_process docs talk about processes like they're expecting POSIX signals, which sounds like it would work great until signals don't exist on Windows.

.toBeInTheDocument();
const cancelButton = await findByText('Cancel Run');
fireEvent.click(cancelButton);
expect(await findByText('Run Canceled')).toBeInTheDocument();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on Windows this will error because we can't properly kill a process?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Tehcnically it fails because that "Run Canceled" message only shows when the subprocess has time to listen and run code on an exit event. Right now on Windows we're resorting to a force kill and there's no such time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! So then the callback is executed by the child process rather than the parent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phargogh Maybe I mistated that. The callback is executed by the parent. Node (the parent) has a reference to the child process (this.investRun) that is listening for events from the child. So maybe the problem here is that the child never has the chance to emit the event?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that I'm actually testing this on Windows, my last two statements are both wrong. The exit code on Windows after the force-kill is 1.

// In the CI the flask app takes more than 10x as long to startup.
// Especially so on macos.
// So, allowing many retries, especially because the error
// that is thrown if all retries fail is swallowed by jest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphic!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦈

@emlys
Copy link
Member

emlys commented Oct 13, 2020

Looks like it's not working on my mac because of the meddlesome space character in ~/Library/Application Support! The full command ends up as: invest -vvv run seasonal_water_yield --headless -d /Users/emily/Library/Application Support/invest-workbench/tmp/data-EYud8J/datastack.json. Hence, an invest: error: unrecognized arguments: Support/invest-workbench/tmp/data-EYud8J/datastack.json.

@davemfish
Copy link
Collaborator Author

Looks like it's not working on my mac because of the meddlesome space character in ~/Library/Application Support! The full command ends up as: invest -vvv run seasonal_water_yield --headless -d /Users/emily/Library/Application Support/invest-workbench/tmp/data-EYud8J/datastack.json. Hence, an invest: error: unrecognized arguments: Support/invest-workbench/tmp/data-EYud8J/datastack.json.

Aha! Great catch, @emlys ! Some quotes should do that job. I'll patch that up.

# add rtree dependency dynamic libraries from conda environment
invest_a.binaries += [
(os.path.basename(name), name, 'BINARY') for name in
glob.glob(os.path.join(conda_env, 'lib/libspatialindex*'))]
elif is_win:
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've made the switch to conda for the invest build workflows, could we add:

# add rtree dependency dynamic libraries from conda environment
      invest_a.binaries += [
          (os.path.basename(name), name, 'BINARY') for name in
          glob.glob(os.path.join(conda_env, 'Library/bin/spatialindex*.dll'))]

This would also mean a change on line 9,10 to set conda_env for windows as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dcdenu4 , I'm on the fence about touching this stuff here in this PR vs. moving the whole python side of this project into invest. If we changed this PyInstaller script here, we'd also have to change the Actions workflow, because it's not using conda for Windows right now.

Now that you have all that set up for invest in the release branch, it could be time to make a new branch like release/workbench-alpha on natcap/invest and move all of the python & PyInstaller stuff out of the workbench project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hi, Rich)

Copy link
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @davemfish ! That node behavior on process exit is pretty surprising and I do wonder about what alternatives/workarounds we might be able to implement, like we talked about on the coffee call this morning. Maybe monitoring the process for cancellation would be best? Or maybe we can just trust that cancellation will always go through? I suppose if all else fails, there's always task manager.

Comment on lines 208 to 210
// This kills, but no chance to handle it and show 'Run Cancelled'
exec(`taskkill /pid ${this.investRun.pid} /t /f`);
// exec(`taskkill /pid ${this.investRun.pid} /t`) // does not kill
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have to run a program to kill the process on windows, this looks right. So then it looks like the this.investRun.on('exit', (code) ... part down below is the one that won't execute once this.investRun exits?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I understand it as well.

Comment on lines 204 to 212
this.investRun.terminate = () => {
if (this.state.jobStatus === 'running') {
// process.kill(-this.investRun.pid, 'SIGTERM'); // does not kill
// this.investRun.kill(); // does not kill
// This kills, but no chance to handle it and show 'Run Cancelled'
exec(`taskkill /pid ${this.investRun.pid} /t /f`);
// exec(`taskkill /pid ${this.investRun.pid} /t`) // does not kill
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like only POSIX systems implement signals, but even Windows should handle interrupts of some sort. One SO post I found mentioned Win32 messages, which makes me wonder if there's just an alternate, windows-specific way to handle events. If that's the case, then how would we do this in node? All of the child_process docs talk about processes like they're expecting POSIX signals, which sounds like it would work great until signals don't exist on Windows.

.toBeInTheDocument();
const cancelButton = await findByText('Cancel Run');
fireEvent.click(cancelButton);
expect(await findByText('Run Canceled')).toBeInTheDocument();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! So then the callback is executed by the child process rather than the parent?

// In the CI the flask app takes more than 10x as long to startup.
// Especially so on macos.
// So, allowing many retries, especially because the error
// that is thrown if all retries fail is swallowed by jest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦈

@emlys
Copy link
Member

emlys commented Oct 16, 2020

puppet.test.js still isn't working for me :( It's still failing on expect(vals.includes('active')).toBeTruthy();. vals is simply nav-link disabled. Everything works fine when I run it manually, and I can click on the Log tab right after clicking Execute. Maybe the assertion is happening too soon and it takes a moment for the Log tab to become active?

@davemfish
Copy link
Collaborator Author

puppet.test.js still isn't working for me :( It's still failing on expect(vals.includes('active')).toBeTruthy();. vals is simply nav-link disabled. Everything works fine when I run it manually, and I can click on the Log tab right after clicking Execute. Maybe the assertion is happening too soon and it takes a moment for the Log tab to become active?

Thanks for continuing to test this @emlys . That's a good thought because the Log tab does not become active immediately, but I just double-checked and the test should be waiting and retrying that expect because it is wrapped in waitFor like this:

await waitFor(async () => {
    const prop = await logTab.getProperty('className');
    const vals = await prop.jsonValue();
    expect(vals.includes('active')).toBeTruthy();
  })

That same Application Support/invest-workbench folder should have logfiles that might reveal what's happening after the Execute button is clicked. Sometimes I tail -f that log and watch it while running the puppet test.

@emlys
Copy link
Member

emlys commented Oct 16, 2020

Oh 🤦‍♀️ I needed to npm run dist again for the changes to take effect. The test is passing now!

@davemfish
Copy link
Collaborator Author

Thanks for reviewing this, everyone. I pushed some changes to address the platform-dependent logic around killing the subprocess. I was partially mistaken when describing the problem earlier. The callback that listens for the exit event fires the same way for all OS. On Windows, the force kill yields an exit code of 1, whereas on other systems the code was null. And that was the only real reason for the different behavior.

The only thing outstanding now is the Mac Build Action. I've made an issue for that (#52 ). Mac builds are good, it's just the puppeteer test that's not running properly in Actions.I don't want that to stop us from merging this PR though.

@davemfish davemfish merged commit 65070af into natcap:main Oct 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants