From 26d75a1e50e543123c6a692e6551ddf95bed870f Mon Sep 17 00:00:00 2001 From: Feike Steenbergen Date: Fri, 20 Mar 2020 13:09:28 +0100 Subject: [PATCH] Remove stale pidfile if it exists The postmaster will refuse to start if the pid of the pidfile is currently in use by the same OS user. This protection mechanism however is not strict enough in a container environment, as we only have the pids in our own namespace. The Volume containing the data directory could accidentally be mounted inside multiple containers, so relying on visibility of the pid is not enough. There is only 1 way for us to communicate to the other postmaster (in another container?) on the same $PGDATA: by removing the pidfile. The other postmaster will shutdown immediately as soon as it determines that its pidfile has been removed. This is a Very Good Thing: it prevents multiple postmasters on the same directory, even in a container environment. See also https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7e2a18) The downside of this change is that it will delay the startup of a crashed container; as we're dealing with data, we'll choose correctness over uptime in this instance. --- CHANGELOG.md | 1 + timescaledb_entrypoint.sh | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b6ed46c2..f26a0370 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 These are changes that will probably be included in the next release. ### Added + * Remove stale pidfile if it exists ### Changed ### Removed ### Fixed diff --git a/timescaledb_entrypoint.sh b/timescaledb_entrypoint.sh index d10daf58..f04f2e4a 100755 --- a/timescaledb_entrypoint.sh +++ b/timescaledb_entrypoint.sh @@ -24,4 +24,27 @@ install -m 0700 -d "${PGDATA}" python3 /scripts/augment_patroni_configuration.py /home/postgres/postgres.yml } +if [ -f "${PGDATA}/postmaster.pid" ]; then + # the postmaster will refuse to start if the pid of the pidfile is currently + # in use by the same OS user. This protection mechanism however is not strict + # enough in a container environment, as we only have the pids in our own namespace. + # The Volume containing the data directory could accidentally be mounted + # inside multiple containers, so relying on visibility of the pid is not enough. + # + # There is only 1 way for us to communicate to the other postmaster (in another container?) + # on the same $PGDATA: by removing the pidfile. + # + # The other postmaster will shutdown immediately as soon as it determines that its + # pidfile has been removed. This is a Very Good Thing: it prevents multiple postmasters + # on the same directory, even in a container environment. + # See also https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7e2a18) + # + # The downside of this change is that it will delay the startup of a crashed container; + # as we're dealing with data, we'll choose correctness over uptime in this instance. + log "Removing stale pidfile ..." + rm "${PGDATA}/postmaster.pid" + log "Sleeping a little to ensure no other postmaster is running anymore" + sleep 65 +fi + exec patroni /home/postgres/postgres.yml