Make Patroni Kubernetes native #500

CyberDem0n · 2017-08-09T09:26:29Z

In order to be able to find all objects related to our Patroni cluster we use labels and labelSelector.

patroni.yaml

name: batman-0  # MUST match with the Pod name where Patroni is running
scope: &scope batman # Patroni cluster name
kubernetes:
  namespace: default  # Kubernetes namespace where Patroni Pods are running and where Patroni will create different ConfigMap objects
  labels:  # Set of labels which will uniquely identify Patroni cluster.
    application: patroni  # They will be used to find all Pods and ConfigMaps associated with the cluster.
    cluster-name: &scope  # Also Patroni will set these labels when to all ConfigMaps object it creates.

Unfortunately Kubernetes API doesn't provide possibility to expire objects, but provides only compare-and-set functionality, therefore we had to implement leader election by periodically updating annotations of <scope>-leader ConfigMap object.
Basic idea is taken from https://github.com/kubernetes/client-go/tree/master/tools/leaderelection.
Every node in the cluster periodically checks the annotations of <scope>-leader ConfigMap object. If annotations were changed, that means we have a leader, if annotations weren't changed during ttl seconds, that means cluster doesn't have a leader.

List of ConfigMaps Patroni is working with:

initialize and config keys are stored as annotations of <scope>-config ConfigMap
leader and optime/leader keys are stored as annotations of <scope>-leader ConfigMap
failover key is stored as annotations of <scope>-failover ConfigMap
sync key is stored as annotations of <scope>-sync ConfigMap

Open questions:

Should Patroni also label Pods with the 'role'?
Should Patroni maintain master Endpoint?
Or we could still use callback for that?

members status is stored in pods annotations.metadata Other structure has slightly changed: leader and optime/leader are merged into configmap: cluster-name-leader initialize and config are merged into configmap: cluster-name-config failover and sync are stayed as it is. Unfortunately kubernetes doesn't provide API for atomic deletes, therefore we just empty metadata instead of deleting objects.

CyberDem0n · 2017-08-09T21:08:49Z

@jberkus, I think you should really try it. It is using Kubernetes API and config maps to store cluster state.

Here you can find Dockerfile and Kubernetes manifest to deploy it: https://github.com/zalando/patroni/tree/feature/k8s/kubernetes

jberkus · 2017-08-09T21:41:22Z

Will test!

FWIW, it's possible Kube will add leader elections in the future.

This is not a critical bug, because `attempt_to_acquire_leader` method was still returning False in this case.

in addition to that implement additional checks around manual failover and recover when synchronous_mode is enabled

* possibility to specify client certs and cacert * possibility to specify token * compatibility with python-consul-0.7.1

And set correct postgres state in a pause mode

The lates one has some problems with none values received instead of empty list: kubernetes-client/python#376

CyberDem0n · 2017-11-23T18:40:16Z

Indeed, this is the problem of kubernetes 4.0.0 module.

I've updated requirements.txt and put fixed version there: kubernetes==3.0.0

unguiculus · 2017-11-23T19:47:18Z

I built a new Spilo image as explained above. I hope this was the right thing to do. The result is this:

postgres-patroni-0 spilo 2017-11-23 19:39:18,615 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
postgres-patroni-0 spilo 2017-11-23 19:39:18,618 - bootstrapping - DEBUG - Starting new HTTP connection (1): 169.254.169.254
postgres-patroni-0 spilo 2017-11-23 19:39:18,621 - bootstrapping - DEBUG - http://169.254.169.254:80 "GET / HTTP/1.1" 200 22
postgres-patroni-0 spilo 2017-11-23 19:39:18,624 - bootstrapping - DEBUG - Starting new HTTP connection (1): metadata.google.internal
postgres-patroni-0 spilo 2017-11-23 19:39:18,633 - bootstrapping - DEBUG - http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/zone HTTP/1.1" 200 42
postgres-patroni-0 spilo 2017-11-23 19:39:18,634 - bootstrapping - INFO - Looks like your running google
postgres-patroni-0 spilo 2017-11-23 19:39:18,649 - bootstrapping - INFO - Configuring patronictl
postgres-patroni-0 spilo 2017-11-23 19:39:18,651 - bootstrapping - INFO - Writing to file /home/postgres/.config/patroni/patronictl.yaml
postgres-patroni-0 spilo 2017-11-23 19:39:18,651 - bootstrapping - INFO - Configuring certificate
postgres-patroni-0 spilo 2017-11-23 19:39:18,651 - bootstrapping - INFO - Generating ssl certificate
postgres-patroni-0 spilo 2017-11-23 19:39:18,708 - bootstrapping - DEBUG - b"Generating a 2048 bit RSA private key\n...................+++\n...........+++\nwriting new private key to '/home/postgres/server.key'\n-----\n"
postgres-patroni-0 spilo 2017-11-23 19:39:18,708 - bootstrapping - INFO - Configuring patroni
postgres-patroni-0 spilo 2017-11-23 19:39:18,717 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
postgres-patroni-0 spilo 2017-11-23 19:39:18,717 - bootstrapping - INFO - Configuring pam-oauth2
postgres-patroni-0 spilo 2017-11-23 19:39:18,717 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
postgres-patroni-0 spilo 2017-11-23 19:39:19,133 INFO: Failed to import patroni.dcs.consul
postgres-patroni-0 spilo 2017-11-23 19:39:19,134 INFO: Failed to import patroni.dcs.etcd
postgres-patroni-0 spilo 2017-11-23 19:39:19,135 INFO: Failed to import patroni.dcs.exhibitor
postgres-patroni-0 spilo 2017-11-23 19:39:19,339 INFO: Lock owner: None; I am postgres-patroni-0
postgres-patroni-0 spilo 2017-11-23 19:39:19,354 INFO: trying to bootstrap a new cluster
postgres-patroni-0 spilo 2017-11-23 19:39:19,354 ERROR: Exception during execution of long running task bootstrap
postgres-patroni-0 spilo Traceback (most recent call last):
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/async_executor.py", line 86, in run
postgres-patroni-0 spilo     wakeup = func(*args) if args else func()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 1471, in bootstrap
postgres-patroni-0 spilo     return do_initialize(config) and self._configure_server_parameters() and self.start()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 523, in _initdb
postgres-patroni-0 spilo     ret = self.pg_ctl('initdb', *options)
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 269, in pg_ctl
postgres-patroni-0 spilo     return subprocess.call(pg_ctl + ['-D', self._data_dir] + list(args), **kwargs) == 0
postgres-patroni-0 spilo   File "/usr/lib/python3.5/subprocess.py", line 557, in call
postgres-patroni-0 spilo     with Popen(*popenargs, **kwargs) as p:
postgres-patroni-0 spilo   File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
postgres-patroni-0 spilo     restore_signals, start_new_session)
postgres-patroni-0 spilo   File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
postgres-patroni-0 spilo     raise child_exception_type(errno_num, err_msg)
postgres-patroni-0 spilo FileNotFoundError: [Errno 2] No such file or directory: 'pg_ctl'
postgres-patroni-0 spilo 2017-11-23 19:39:19,377 INFO: removing initialize key after failed attempt to bootstrap the cluster
postgres-patroni-0 spilo Traceback (most recent call last):
postgres-patroni-0 spilo   File "/usr/local/bin/patroni", line 11, in <module>
postgres-patroni-0 spilo     load_entry_point('patroni==1.3.6', 'console_scripts', 'patroni')()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 174, in main
postgres-patroni-0 spilo     return patroni_main()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 143, in patroni_main
postgres-patroni-0 spilo     patroni.run()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 114, in run
postgres-patroni-0 spilo     logger.info(self.ha.run_cycle())
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 1093, in run_cycle
postgres-patroni-0 spilo     info = self._run_cycle()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 1017, in _run_cycle
postgres-patroni-0 spilo     return self.post_bootstrap()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 922, in post_bootstrap
postgres-patroni-0 spilo     self.cancel_initialization()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 917, in cancel_initialization
postgres-patroni-0 spilo     raise PatroniException('Failed to bootstrap cluster')
postgres-patroni-0 spilo patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'

Let me know if this is not the right place to further discuss this. I don't want to abuse this PR. Would you be willing to further assist me? I could create an issue or you can find me on Kubernetes Slack.

CyberDem0n · 2017-11-24T12:16:20Z

I built a new Spilo image as explained above. I hope this was the right thing to do. The result is this:
postgres-patroni-0 spilo FileNotFoundError: [Errno 2] No such file or directory: 'pg_ctl'

Oh, I've told you to build it with --build-arg DEMO=true :(

"--build-arg DEMO=true" is used to build an image with postgres 10 only and without a lot of heavy stuff. It is good enough to try it with minikube for example, but it is not for production, because there is wal-e inside.

And you hit a bug that $PATH wasn't propagated to patroni and it failed to run pg_ctl initdb.
Anyway, bug in spilo is fixed and it should work.

Normal spilo image contains postgres 9.3, 9.4, 9.5, 9.6 and 10.
By default it will start postgres 10, but you can change it with environment variable: SPILO_CONFIGURATION='{postgresql: {bin_dir: /usr/lib/postgresql/9.6/bin}'
Basically it is possible to supply any part of patroni configuration in the SPILO_CONFIGURATION and it would be "merged" into config generated by spilo.

unguiculus · 2017-11-24T14:53:01Z

I built the new image without demo and tried it with Postgres 10 and 9.6. Everything starts up fine with now with both, KUBERNETES_USE_CONFIGMAPS=true and KUBERNETES_USE_CONFIGMAPS=false. Leader election works but the service is still not updated with the endpoint.

CyberDem0n · 2017-11-25T13:20:10Z

Leader election works but the service is still not updated with the endpoint.

I think I know what the problem is. At some moment we started creating Services with a "named" port:

apiVersion: v1
kind: Service
metadata:
  name: &cluster_name patronidemo
  labels:
    application: spilo
    version: *cluster_name
spec:
  type: ClusterIP
  ports:
  - port: 5432
    targetPort: 5432
    name: postgresql # XXX

This commit explain why: zalando/spilo@2be341a#diff-8c54fa1e5677a832585d18f396619701

If name of Service and Endpoint doesn't match - service will not work.
This is configured in spilo: https://github.com/zalando/spilo/pull/198/files#diff-9ec119124414a288bc2f1edd468dcce5R377

I think in your case you've created Service with no name assigned for port=5432

unguiculus · 2017-11-25T22:53:25Z

Thanks, that was the missing piece. It works for both modes (KUBERNETES_USE_CONFIGMAPS=true|false). Which mode would be the preferred one? I noticed that it is important to clean up. If you install it and an old endpoint or configmap still exists, the cluster will form correctly, but the service doesn't get the endpoint. Maybe this can be improved. I updated my Helm chart. It now removes both, configmaps and endpoints, in a post-delete hook.

unguiculus · 2017-11-27T08:06:41Z

I also experimented with parallel pod management and rolling updates. Do you see a problem with this?

unguiculus/charts@dd94767

CyberDem0n · 2017-11-27T08:22:03Z

Which mode would be the preferred one?

Endpoints. Otherwise there is a race condition: #536

I noticed that it is important to clean up. If you install it and an old endpoint or configmap still exists, the cluster will form correctly, but the service doesn't get the endpoint. Maybe this can be improved.

I don't think that it will "form correctly". Patroni will notice that there is a <cluster-name>-config endpoint or configmap, assume that cluster was already initialized and will not run initdb but try to restore from a backup (with wal-e) or all pods wait until there will be a leader to get basebackup from it.

Database clusters are stateful and usually you don't delete-create them all the time. There is not much we can improve.

I also experimented with parallel pod management and rolling updates. Do you see a problem with this?

I've never played with rolling upgrades so far, but I think it might be dangerous. It is very important to not terminate the next pod until the previous one become healthy enough (started streaming from master and replication lag close to 0).
Actually we are not really using helm charts, but postgres operator for everything: cluster creation and removal, user management, PV management, rolling upgrades and so on.

unguiculus · 2017-11-27T08:44:27Z

I've seen the operator project and was wondering which way to go. I'm looking for a solution without an additional DCS like Etcd. Is this also possible with the operator?

CyberDem0n · 2017-11-27T11:30:47Z

Postgres-operator doesn't care what DCS is used by Patroni. It just passes some environment variables to Spilo.

jberkus · 2017-12-05T03:30:50Z

So, some testing questions;

Does the main spilo image now support running as kube-native? If so, how do I turn this behavior on? If not, which image do I need to use?
Where do I set KUBERNETES_USE_CONFIGMAP? is that an ENV in the pod definition? If so, it doesn't seem to have an effect.
@ants did you get some version of this working on OpenShift?

CyberDem0n · 2017-12-05T07:56:53Z

The latest spilo image supporting kube-native is registry.opensource.zalan.do/acid/spilo-10:1.3-p6.
It is build from this branch: zalando/spilo#198, which is not yet merged into master.

In order to enable Kubernetes api for leader election you should set DCS_ENABLE_KUBERNETES_API env. By default it is using Endpoints. It eliminate a lot of problems and race conditions, because subsets for the leader endpoint are set at the same time when leader lock is acquired/updated. If you want to use config maps for leader election, you have to set KUBERNETES_USE_CONFIGMAP env. Basically all "magic" of generating patroni config yaml is hidden here: https://github.com/zalando/spilo/pull/198/files#diff-9ec119124414a288bc2f1edd468dcce5R399 and here: https://github.com/zalando/spilo/pull/198/files#diff-9ec119124414a288bc2f1edd468dcce5R442

jberkus · 2017-12-06T04:08:24Z

OK, so KUBERNETES_USE_CONFIGMAP is in spilo but not in upstream patroni?

CyberDem0n · 2017-12-06T07:38:47Z

Yes, KUBERNETES_USE_CONFIGMAP is in Spilo. Somehow in Spilo it is reverted in comparison to Patroni. Because it Patroni I wanted to make a safe choice.

In Patroni kubernetes configs are different:

kubernetes.use_endpoints in yaml or PATRONI_KUBERNETES_USE_ENDPOINTS in env
kubernetes.namespace in yaml or PATRONI_KUBERNETES_NAMESCAPE in env -- kubernetes namespace where we are running
kubernetes.labels in yaml or PATRONI_KUBERNETES_LABELS in env -- These labels will be used to find existing objects (Endpoints|ConfigMaps + Pods) associated with the current cluster. Also Patroni will set them on every object (Endpoint|ConfigMap) it creates/updates.
kubernetes.scope_label in yaml or PATRONI_KUBERNETES_SCOPE_LABEL in env -- name of the label containing cluster name. Default value is cluster-name
kubernetes.role_label in yaml or PATRONI_KUBERNETES_ROLE_LABEL in env -- name of the label containing role (master or replica). Patroni will set this label on the pod it runs in. Default value is role
kubernetes.pod_ip in yaml or PATRONI_KUBERNETES_POD_IP in env -- ip of the pod we are running in. It's only necessary when use_endpoints is enabled, to write this ip into the leader endpoint subsets.
kubernetes.ports in yaml or PATRONI_KUBERNETES_PORTS in env -- if the Service object has the name for the port, the same name must appear in the Endpoint object, otherwise service wont work. Example: {kund: Service, spec: {ports: [{name: postgresql, port: 5432, targetPort: 5432}]}}. In this case you have to define kubernetes.ports: {[{"name": "postgresql", "port": 5432}]} and Patroni will use it for updating subsets of a leader Endpoint. This parameter is not used if use_endpoints is not set.

jberkus · 2017-12-06T15:30:17Z

OK, trying use_endpoints in Openshift, will report back.

Clearly I need to write a config doc for this.

hjacobs · 2017-12-07T06:25:58Z

This is AFAIK already running in production, can we merge this PR?

ants · 2017-12-07T13:45:33Z

@jberkus - yes I got a version of this working with OpenShift. I actually did use heavily modified Spilo because I wanted the archiving and backup features from there. There is plenty of issues to resolve when building the image, for example initdb will fail because getpwnam() doesn't work because the container user does not exist, everything needs to run as the fake root user, anything that will be modified from the container needs g+rw permissions set on it, setuid will not work at all, so cron daemon needs to be replaced.

I used the config map based approach, the race condition is an acceptable risk for now.

I haven't tried to integrate the latest version, but that is just because I have been busy with other tasks.

alexeyklyukin · 2017-12-08T15:19:52Z

👍

jberkus · 2017-12-08T15:33:36Z

@CyberDem0n

To date, this branch has done well in all of my testing. I have yet to hit a specific bug with it.

@ants

See follow-up issue ...

CyberDem0n · 2017-12-08T15:48:20Z

👍

Bug was introduced in #947 and #500

Alexander Kukushkin added 9 commits August 7, 2017 11:08

Try to run behave on travis against localkube

a4bedd5

Small refactoring: DRY

e5747cc

Call retry only from crytical places

5949ace

Retries and timeouts

98786ce

Implement unit-tests

2bf19cc

Define aws as extra to install boto and don't demand flake8

8d06cbf

Adjust minimal versions of some modules

548d986

consul=0.7.0

abde53a

patroni deleted a comment Aug 9, 2017

Alexander Kukushkin added 2 commits August 9, 2017 14:41

Try to load incluster config first

3ae42d8

Bump urllib3 version

1056874

patroni deleted a comment Aug 9, 2017

Alexander Kukushkin added 2 commits August 9, 2017 15:18

Add Dockerfile and manifest for k8s

a9782fa

Add callback script for managing master endpoint

bef9478

patroni deleted a comment Aug 9, 2017

Alexander Kukushkin added 9 commits August 11, 2017 14:53

"Could not take out TTL lock" message was never logged

b1750f8

This is not a critical bug, because `attempt_to_acquire_leader` method was still returning False in this case.

synchronous_standby_names must be quoted with quote_ident

44f92b1

in addition to that implement additional checks around manual failover and recover when synchronous_mode is enabled

Comparison must be case insensitive

92cf258

Advanced configuration for Consul

67deffa

* possibility to specify client certs and cacert * possibility to specify token * compatibility with python-consul-0.7.1

Do not send keepalives if is not active

77558ec

Do not try to execute query if postgres is stopped

833b500

Avoid activating watchdog in a pause mode

2759f23

And set correct postgres state in a pause mode

More acceptance tests for watchdog

c04be65

Fix unit-tests

a3b004c

Alexander Kukushkin added 2 commits November 23, 2017 19:36

Merge branch 'master' of github.com:zalando/patroni into feature/k8s

83a24a2

Fix kubernetes module version to 3.0.0

d3a729f

The lates one has some problems with none values received instead of empty list: kubernetes-client/python#376

Merge branch 'master' of github.com:zalando/patroni into feature/k8s

b49917f

CyberDem0n merged commit 4328c15 into master Dec 8, 2017

CyberDem0n mentioned this pull request Dec 12, 2017

Please make Kubernetes example working #574

Closed

ghost mentioned this pull request Jan 21, 2018

Using CRD or annotations to store cluster info in Kubernetes sorintlab/stolon#411

Closed

CyberDem0n deleted the feature/k8s branch September 25, 2018 06:20

CyberDem0n pushed a commit that referenced this pull request Feb 8, 2019

Fix handling of PATRONI_*_PASSWORD environment variables

b90b37e

Bug was introduced in #947 and #500

CyberDem0n mentioned this pull request Feb 8, 2019

Fix handling of PATRONI_*_PASSWORD environment variables #970

Merged

CyberDem0n added a commit that referenced this pull request Feb 15, 2019

Fix handling of PATRONI_*_PASSWORD environment variables (#970)

3177219

Bug was introduced in #947 and #500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Patroni Kubernetes native #500

Make Patroni Kubernetes native #500

CyberDem0n commented Aug 9, 2017 •

edited

Loading

CyberDem0n commented Aug 9, 2017

jberkus commented Aug 9, 2017

CyberDem0n commented Nov 23, 2017

unguiculus commented Nov 23, 2017

CyberDem0n commented Nov 24, 2017

unguiculus commented Nov 24, 2017

CyberDem0n commented Nov 25, 2017

unguiculus commented Nov 25, 2017

unguiculus commented Nov 27, 2017

CyberDem0n commented Nov 27, 2017

unguiculus commented Nov 27, 2017

CyberDem0n commented Nov 27, 2017

jberkus commented Dec 5, 2017

CyberDem0n commented Dec 5, 2017

jberkus commented Dec 6, 2017

CyberDem0n commented Dec 6, 2017

jberkus commented Dec 6, 2017 •

edited

Loading

hjacobs commented Dec 7, 2017

ants commented Dec 7, 2017

alexeyklyukin commented Dec 8, 2017

jberkus commented Dec 8, 2017

CyberDem0n commented Dec 8, 2017

Make Patroni Kubernetes native #500

Make Patroni Kubernetes native #500

Conversation

CyberDem0n commented Aug 9, 2017 • edited Loading

CyberDem0n commented Aug 9, 2017

jberkus commented Aug 9, 2017

CyberDem0n commented Nov 23, 2017

unguiculus commented Nov 23, 2017

CyberDem0n commented Nov 24, 2017

unguiculus commented Nov 24, 2017

CyberDem0n commented Nov 25, 2017

unguiculus commented Nov 25, 2017

unguiculus commented Nov 27, 2017

CyberDem0n commented Nov 27, 2017

unguiculus commented Nov 27, 2017

CyberDem0n commented Nov 27, 2017

jberkus commented Dec 5, 2017

CyberDem0n commented Dec 5, 2017

jberkus commented Dec 6, 2017

CyberDem0n commented Dec 6, 2017

jberkus commented Dec 6, 2017 • edited Loading

hjacobs commented Dec 7, 2017

ants commented Dec 7, 2017

alexeyklyukin commented Dec 8, 2017

jberkus commented Dec 8, 2017

CyberDem0n commented Dec 8, 2017

CyberDem0n commented Aug 9, 2017 •

edited

Loading

jberkus commented Dec 6, 2017 •

edited

Loading