Support Graceful Pod Shutdown #3380

t83714 · 2022-07-21T02:06:13Z

Support Graceful Pod Shutdown

The dev cluster uses preemptible nodes for cost saving. Because of it, most pods will be terminated at least once for every 24 hours.

When k8s terminates the nodes, the pod container will receive SIGTERM signal and is given 30s (default value of grace period) to gracefully exit (before another SIGKILL was sent to the container for immediate exit).

Our microservices should support the Graceful Pod Shutdown by recognising the SIGTERM signal and exiting voluntarily.

Failing to do so will not cause any functionality impact. However, it might leave confusing "Error" status pods on the logs.

For node-based microservice, we can:

process.on('SIGTERM', () => {
  debug('SIGTERM signal received: closing HTTP server')
  server.close(() => {
    debug('HTTP server closed')
  })
})

For scala-based (AKKA) microservice, see graceful termination of AKKA doc.

The text was updated successfully, but these errors were encountered:

t83714 · 2022-08-08T03:19:46Z

For node-based pods, use http-terminator

import { createHttpTerminator } from "http-terminator";
const server = app.listen(argv.listenPort);
const httpTerminator = createHttpTerminator({
    server
});
process.on("SIGTERM", () => {
    console.log("SIGTERM signal received: closing HTTP server");
    httpTerminator.terminate().then(() => {
        console.log("HTTP server closed");
        process.exit(0);
    });
});

For scala / AKKA service, use Akka Coordinated Shutdown.
Instead of using the default AKKA HTTP behaviour introduced by addToCoordinatedShutdown, we will adopt this solution to make sure chunked response had a chance to handle properly (as we always wait for at least 10 seconds).

val binding = Http().bindAndHandle(routes, "127.0.0.1", 8080)
shutdown.addTask(CoordinatedShutdown.PhaseServiceUnbind, "http-unbind") { () =>
  binding.flatMap(_.unbind()).map(_ => Done)
}
shutdown.addTask(CoordinatedShutdown.PhaseServiceRequestsDone, "http-graceful-termination") { () =>
  binding.flatMap(b =>
    akka.pattern.after(10.seconds, system.scheduler)(b.terminate(15.seconds))
  ).map(_ => Done)
}

Default AKKA CoordinatedShutdown setting is here

t83714 · 2022-08-10T05:47:23Z

closed via PR: #3392

t83714 · 2022-08-11T00:31:49Z

More on this:

The default terminationGracePeriodSeconds value is 30 seconds. But for pods run on the graceful termination of nodes that use preemptible VMs, this will be max. 25 seconds. See below from [GKE doc](from here):

On a best-effort basis, the kubelet grants non-system Pods 25 seconds to gracefully terminate, after which system Pods (with the system-cluster-critical or system-node-critical priority classes) have five seconds to gracefully terminate.
Note: Adjusting the value of terminationGracePeriodSeconds to more than 25 in your Pod spec has no effect on the graceful termination of nodes that use preemptible VMs.

And in real life (notice that 25s is on a best-effort basis), more likely you only get 15s GracePeriod for your pod, see the logs below:

"I0810 22:31:38.342374 1606 kuberuntime_container.go:723] "Killing container with a grace period" pod="xxxxx-xxx/xxxxxxx" podUID=4bd29ba8-5007-4939-8b83-3b922feb86b9 containerName="xxx-xxx-xxx" containerID="docker://ab8c2b83d2a62f209498fedbef870d7186803e4bf9a56dbb17a47874f0fffb86" gracePeriod=15"

We will create a new ticket to make sure all pods (especially scala ones as they will wait for a few seconds for possible chunked response) can be graceful terminated within 15 seconds.

t83714 · 2022-08-11T00:40:21Z

Created a new ticket here: #3394

t83714 added the feature request label Jul 21, 2022

t83714 added this to the Next (v2.0.0) milestone Aug 8, 2022

t83714 added a commit that referenced this issue Aug 8, 2022

#3380 Support Graceful Pod Shutdown (Scala services)

adbb0cb

t83714 mentioned this issue Aug 9, 2022

Issue/3380 Support Graceful Pod Shutdown #3392

Merged

2 tasks

t83714 closed this as completed Aug 10, 2022

t83714 mentioned this issue Aug 11, 2022

Make sure scala pods can be gracefully terminated within 15 seconds #3394

Closed

mkurz mentioned this issue Apr 14, 2023

akka-http and netty backend use PhaseServiceRequestsDone to drain requests playframework/playframework#11758

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Graceful Pod Shutdown #3380

Support Graceful Pod Shutdown #3380

t83714 commented Jul 21, 2022

t83714 commented Aug 8, 2022 •

edited

Loading

t83714 commented Aug 10, 2022

t83714 commented Aug 11, 2022

t83714 commented Aug 11, 2022

Support Graceful Pod Shutdown #3380

Support Graceful Pod Shutdown #3380

Comments

t83714 commented Jul 21, 2022