Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown when a CTRL-C (or similar activity) is entered on the controller #278

Open
k-rister opened this issue Jan 10, 2023 · 2 comments · Fixed by #303
Open

Graceful shutdown when a CTRL-C (or similar activity) is entered on the controller #278

k-rister opened this issue Jan 10, 2023 · 2 comments · Fixed by #303
Assignees
Labels
bug Something isn't working

Comments

@k-rister
Copy link
Contributor

No description provided.

@k-rister k-rister converted this from a draft issue Jan 10, 2023
@k-rister
Copy link
Contributor Author

I've been thinking about this and I think we need to resolve a few things to make this work the way we want. I believe the way to achieve what we want is for the controller to catch the CTRL-C signal (or similar activity) and then send an abort via roadblock so that all the participants in the run know to cleanup and shutdown. There are 2 immediate problems with this that immediately come to mind:

  1. On the controller we need to ensure that rickshaw is properly handling the CTRL-C signal. If roadblock is currently running then this probably means that rickshaw-run needs to ignore the signal and roadblock itself needs to catch the signal and send the abort. If roadblock is not currently running then rickshaw-run needs to catch the signal and instruct the next roadblock to send an abort.
  2. On the participant side (endpoints and engines) we are going to need to move as much of the execution of code into scripts that can be called using roadblock's wait-for functionality. This is because the distribution of the abort signal via roadblock is not going to actually cancel the run in a timely manner unless roadblock is running and in control of the execution flow. In many/most situations today roadblock is only executed when something has completed (such as a benchmark run) and then the next synchronization point is reached. In this mode it could be a very long time (ie. up to the entire length of a benchmark run) for the abort to be properly received by all participants.

I feel like the whole point of this feature is to be able to properly cleanup a canceled run as quickly as possible...if it's not being done quickly then what is the point of canceling?

Unfortunately, moving various code execution blocks to the roadblock wait-for functionality is probably not going to be a trivial exercise. I imagine that in some situations it will be pretty easy and in others it will be quite complicated. Right now there is a lot of global environment variables being used in some places that is not going to properly traverse the <program>-><roadblock>-><wait-for execution> exec+fork flow so we will need to clean that up and use config files, parameters, etc. to get that information to where it needs to be at runtime.

Another issue, albeit more minor, is that roadblock itself needs to have the way it handles a CTRL-C changed. Today it catches the signal and shutdowns down the program. It needs to instead catch the signal and then change the program flow from whatever it currently is to that of an abort path. This is simple in concept but it may cause some issues in the state machine that is current implemented. I don't think this is a huge problem, but it may take some time to figure out the modifications required in the state machine.

@atheurer thoughts?

@k-rister k-rister self-assigned this Feb 28, 2023
@k-rister k-rister added the bug Something isn't working label Mar 8, 2023
@k-rister k-rister moved this from Todo to In Progress in Crucible Tracking Mar 12, 2023
@k-rister
Copy link
Contributor Author

@k-rister k-rister linked a pull request Apr 21, 2023 that will close this issue
@k-rister k-rister reopened this Apr 21, 2023
@k-rister k-rister moved this from In Progress to Todo in Crucible Tracking Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

1 participant