Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast-reboot] Stop services after killing containers to prevent automatic restart #572

Merged
merged 1 commit into from
Jul 16, 2019
Merged

[fast-reboot] Stop services after killing containers to prevent automatic restart #572

merged 1 commit into from
Jul 16, 2019

Conversation

jleveque
Copy link
Contributor

@jleveque jleveque commented Jul 4, 2019

- What I did

Explicitly stop corresponding services after killing Docker containers to prevent the service from automatically restarting the container during fast-/warm-reboot.

We are beginning to configure the services which control Docker containers to automatically restart the service and all dependent services if the container should ever stop. In the case of fast-/warm-reboot, stopping the service could require up to 10 seconds to stop the container (due to the behavior of the underlying docker stop command). This could take too long, so in this script we call docker kill, which will work faster. However, after killing the container, the service which controls the container (once configured to restart) will restart the container after the specified interval, but we do not want the behavior under these fast-/warm-reboot circumstances. The services will be restarted after completing the reboot. Therefore, we stop the service immediately after killing the corresponding container to ensure the service cannot automatically restart the container.

@jleveque jleveque requested review from lguohan and yxieca July 4, 2019 01:50
@jleveque jleveque self-assigned this Jul 4, 2019
@lguohan
Copy link
Contributor

lguohan commented Jul 8, 2019

I can see similiar commands in sonic_installer, can you check them as well?

        run_command("docker exec -i bgp pkill -9 zebra")
        run_command("docker exec -i bgp pkill -9 bgpd")

Copy link
Contributor

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments

@jleveque
Copy link
Contributor Author

jleveque commented Jul 8, 2019

sonic_installer does have calls to pkill -9 ..., but no calls to docker kill ..., which is what this PR addresses. For the calls to pkill -9 ..., we will need to address these when adding the auto-restart mechanism to these containers.

@pavel-shirshov
Copy link
Contributor

what if the service is going to be restarted before it was stopped?
With this patch we introduce the race condition here.

@qiluo-msft
Copy link
Contributor

Retest this please

@jleveque
Copy link
Contributor Author

@pavel-shirshov: If someone or something had called systemctl restart ... on one of these services and this script called systemctl stop ... while the restart operation was still in progress, whichever command was run last (in this case, systemctl stop ...) would override the pending systemctl restart .... Do you have some other concern?

@pavel-shirshov
Copy link
Contributor

@jleveque
Current behavior is following:

  1. We kill a process/processes inside of the docker container
  2. We stop the container itself because we afraid the process would be restarted by supervisorctl inside of the container (fix me if I'm wrong here)

So I think if we don't fast enough to stop the container, but supervisorctl restarted just killed processes already?

@jleveque
Copy link
Contributor Author

These processes are not set to be restarted by supervisor inside the containers. The restart mechanism uses a supervisor event listener to listen for processes exiting, which then causes supervisor to exit and subsequently the container itself. Then, systemd will restart the entire container. Here we're killing the container, then stopping the systemd service to prevent it from restarting the container.

@jleveque
Copy link
Contributor Author

Retest this please

1 similar comment
@jleveque
Copy link
Contributor Author

Retest this please

@pavel-shirshov
Copy link
Contributor

@jleveque Got it
Thanks

@jleveque jleveque merged commit ee56d54 into sonic-net:master Jul 16, 2019
@jleveque jleveque deleted the fast_reboot_prevent_service_restart_on_kill branch July 16, 2019 18:10
yxieca pushed a commit that referenced this pull request Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants