Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Patroni Kubernetes native #500

Merged
merged 60 commits into from
Dec 8, 2017
Merged

Make Patroni Kubernetes native #500

merged 60 commits into from
Dec 8, 2017

Conversation

CyberDem0n
Copy link
Member

@CyberDem0n CyberDem0n commented Aug 9, 2017

In order to be able to find all objects related to our Patroni cluster we use labels and labelSelector.

patroni.yaml

name: batman-0  # MUST match with the Pod name where Patroni is running
scope: &scope batman # Patroni cluster name
kubernetes:
  namespace: default  # Kubernetes namespace where Patroni Pods are running and where Patroni will create different ConfigMap objects
  labels:  # Set of labels which will uniquely identify Patroni cluster.
    application: patroni  # They will be used to find all Pods and ConfigMaps associated with the cluster.
    cluster-name: &scope  # Also Patroni will set these labels when to all ConfigMaps object it creates.

Unfortunately Kubernetes API doesn't provide possibility to expire objects, but provides only compare-and-set functionality, therefore we had to implement leader election by periodically updating annotations of <scope>-leader ConfigMap object.
Basic idea is taken from https://github.com/kubernetes/client-go/tree/master/tools/leaderelection.
Every node in the cluster periodically checks the annotations of <scope>-leader ConfigMap object. If annotations were changed, that means we have a leader, if annotations weren't changed during ttl seconds, that means cluster doesn't have a leader.

List of ConfigMaps Patroni is working with:

  • initialize and config keys are stored as annotations of <scope>-config ConfigMap
  • leader and optime/leader keys are stored as annotations of <scope>-leader ConfigMap
  • failover key is stored as annotations of <scope>-failover ConfigMap
  • sync key is stored as annotations of <scope>-sync ConfigMap

Open questions:

  • Should Patroni also label Pods with the 'role'?
  • Should Patroni maintain master Endpoint?
  • Or we could still use callback for that?

Alexander Kukushkin added 9 commits August 7, 2017 11:08
members status is stored in pods annotations.metadata

Other structure has slightly changed:
leader and optime/leader are merged into configmap: cluster-name-leader
initialize and config are merged into configmap: cluster-name-config
failover and sync are stayed as it is.

Unfortunately kubernetes doesn't provide API for atomic deletes,
therefore we just empty metadata instead of deleting objects.
@patroni patroni deleted a comment Aug 9, 2017
@patroni patroni deleted a comment Aug 9, 2017
@patroni patroni deleted a comment Aug 9, 2017
@patroni patroni deleted a comment Aug 9, 2017
@patroni patroni deleted a comment Aug 9, 2017
@patroni patroni deleted a comment Aug 9, 2017
@CyberDem0n
Copy link
Member Author

@jberkus, I think you should really try it. It is using Kubernetes API and config maps to store cluster state.

Here you can find Dockerfile and Kubernetes manifest to deploy it: https://github.com/zalando/patroni/tree/feature/k8s/kubernetes

@jberkus
Copy link
Contributor

jberkus commented Aug 9, 2017

Will test!

FWIW, it's possible Kube will add leader elections in the future.

Alexander Kukushkin added 9 commits August 11, 2017 14:53
This is not a critical bug, because `attempt_to_acquire_leader` method
was still returning False in this case.
in addition to that implement additional checks around manual failover
and recover when synchronous_mode is enabled
* possibility to specify client certs and cacert
* possibility to specify token
* compatibility with python-consul-0.7.1
And set correct postgres state in a pause mode
Alexander Kukushkin added 2 commits November 23, 2017 19:36
@CyberDem0n
Copy link
Member Author

Indeed, this is the problem of kubernetes 4.0.0 module.

I've updated requirements.txt and put fixed version there: kubernetes==3.0.0

@unguiculus
Copy link
Contributor

I built a new Spilo image as explained above. I hope this was the right thing to do. The result is this:

postgres-patroni-0 spilo 2017-11-23 19:39:18,615 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
postgres-patroni-0 spilo 2017-11-23 19:39:18,618 - bootstrapping - DEBUG - Starting new HTTP connection (1): 169.254.169.254
postgres-patroni-0 spilo 2017-11-23 19:39:18,621 - bootstrapping - DEBUG - http://169.254.169.254:80 "GET / HTTP/1.1" 200 22
postgres-patroni-0 spilo 2017-11-23 19:39:18,624 - bootstrapping - DEBUG - Starting new HTTP connection (1): metadata.google.internal
postgres-patroni-0 spilo 2017-11-23 19:39:18,633 - bootstrapping - DEBUG - http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/zone HTTP/1.1" 200 42
postgres-patroni-0 spilo 2017-11-23 19:39:18,634 - bootstrapping - INFO - Looks like your running google
postgres-patroni-0 spilo 2017-11-23 19:39:18,649 - bootstrapping - INFO - Configuring patronictl
postgres-patroni-0 spilo 2017-11-23 19:39:18,651 - bootstrapping - INFO - Writing to file /home/postgres/.config/patroni/patronictl.yaml
postgres-patroni-0 spilo 2017-11-23 19:39:18,651 - bootstrapping - INFO - Configuring certificate
postgres-patroni-0 spilo 2017-11-23 19:39:18,651 - bootstrapping - INFO - Generating ssl certificate
postgres-patroni-0 spilo 2017-11-23 19:39:18,708 - bootstrapping - DEBUG - b"Generating a 2048 bit RSA private key\n...................+++\n...........+++\nwriting new private key to '/home/postgres/server.key'\n-----\n"
postgres-patroni-0 spilo 2017-11-23 19:39:18,708 - bootstrapping - INFO - Configuring patroni
postgres-patroni-0 spilo 2017-11-23 19:39:18,717 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
postgres-patroni-0 spilo 2017-11-23 19:39:18,717 - bootstrapping - INFO - Configuring pam-oauth2
postgres-patroni-0 spilo 2017-11-23 19:39:18,717 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
postgres-patroni-0 spilo 2017-11-23 19:39:19,133 INFO: Failed to import patroni.dcs.consul
postgres-patroni-0 spilo 2017-11-23 19:39:19,134 INFO: Failed to import patroni.dcs.etcd
postgres-patroni-0 spilo 2017-11-23 19:39:19,135 INFO: Failed to import patroni.dcs.exhibitor
postgres-patroni-0 spilo 2017-11-23 19:39:19,339 INFO: Lock owner: None; I am postgres-patroni-0
postgres-patroni-0 spilo 2017-11-23 19:39:19,354 INFO: trying to bootstrap a new cluster
postgres-patroni-0 spilo 2017-11-23 19:39:19,354 ERROR: Exception during execution of long running task bootstrap
postgres-patroni-0 spilo Traceback (most recent call last):
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/async_executor.py", line 86, in run
postgres-patroni-0 spilo     wakeup = func(*args) if args else func()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 1471, in bootstrap
postgres-patroni-0 spilo     return do_initialize(config) and self._configure_server_parameters() and self.start()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 523, in _initdb
postgres-patroni-0 spilo     ret = self.pg_ctl('initdb', *options)
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 269, in pg_ctl
postgres-patroni-0 spilo     return subprocess.call(pg_ctl + ['-D', self._data_dir] + list(args), **kwargs) == 0
postgres-patroni-0 spilo   File "/usr/lib/python3.5/subprocess.py", line 557, in call
postgres-patroni-0 spilo     with Popen(*popenargs, **kwargs) as p:
postgres-patroni-0 spilo   File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
postgres-patroni-0 spilo     restore_signals, start_new_session)
postgres-patroni-0 spilo   File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
postgres-patroni-0 spilo     raise child_exception_type(errno_num, err_msg)
postgres-patroni-0 spilo FileNotFoundError: [Errno 2] No such file or directory: 'pg_ctl'
postgres-patroni-0 spilo 2017-11-23 19:39:19,377 INFO: removing initialize key after failed attempt to bootstrap the cluster
postgres-patroni-0 spilo Traceback (most recent call last):
postgres-patroni-0 spilo   File "/usr/local/bin/patroni", line 11, in <module>
postgres-patroni-0 spilo     load_entry_point('patroni==1.3.6', 'console_scripts', 'patroni')()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 174, in main
postgres-patroni-0 spilo     return patroni_main()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 143, in patroni_main
postgres-patroni-0 spilo     patroni.run()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/__init__.py", line 114, in run
postgres-patroni-0 spilo     logger.info(self.ha.run_cycle())
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 1093, in run_cycle
postgres-patroni-0 spilo     info = self._run_cycle()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 1017, in _run_cycle
postgres-patroni-0 spilo     return self.post_bootstrap()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 922, in post_bootstrap
postgres-patroni-0 spilo     self.cancel_initialization()
postgres-patroni-0 spilo   File "/usr/local/lib/python3.5/dist-packages/patroni/ha.py", line 917, in cancel_initialization
postgres-patroni-0 spilo     raise PatroniException('Failed to bootstrap cluster')
postgres-patroni-0 spilo patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'

Let me know if this is not the right place to further discuss this. I don't want to abuse this PR. Would you be willing to further assist me? I could create an issue or you can find me on Kubernetes Slack.

@CyberDem0n
Copy link
Member Author

I built a new Spilo image as explained above. I hope this was the right thing to do. The result is this:
postgres-patroni-0 spilo FileNotFoundError: [Errno 2] No such file or directory: 'pg_ctl'

Oh, I've told you to build it with --build-arg DEMO=true :(

"--build-arg DEMO=true" is used to build an image with postgres 10 only and without a lot of heavy stuff. It is good enough to try it with minikube for example, but it is not for production, because there is wal-e inside.

And you hit a bug that $PATH wasn't propagated to patroni and it failed to run pg_ctl initdb.
Anyway, bug in spilo is fixed and it should work.

Normal spilo image contains postgres 9.3, 9.4, 9.5, 9.6 and 10.
By default it will start postgres 10, but you can change it with environment variable: SPILO_CONFIGURATION='{postgresql: {bin_dir: /usr/lib/postgresql/9.6/bin}'
Basically it is possible to supply any part of patroni configuration in the SPILO_CONFIGURATION and it would be "merged" into config generated by spilo.

@unguiculus
Copy link
Contributor

I built the new image without demo and tried it with Postgres 10 and 9.6. Everything starts up fine with now with both, KUBERNETES_USE_CONFIGMAPS=true and KUBERNETES_USE_CONFIGMAPS=false. Leader election works but the service is still not updated with the endpoint.

@CyberDem0n
Copy link
Member Author

Leader election works but the service is still not updated with the endpoint.

I think I know what the problem is. At some moment we started creating Services with a "named" port:

apiVersion: v1
kind: Service
metadata:
  name: &cluster_name patronidemo
  labels:
    application: spilo
    version: *cluster_name
spec:
  type: ClusterIP
  ports:
  - port: 5432
    targetPort: 5432
    name: postgresql # XXX

This commit explain why: zalando/spilo@2be341a#diff-8c54fa1e5677a832585d18f396619701

If name of Service and Endpoint doesn't match - service will not work.
This is configured in spilo: https://github.com/zalando/spilo/pull/198/files#diff-9ec119124414a288bc2f1edd468dcce5R377

I think in your case you've created Service with no name assigned for port=5432

@unguiculus
Copy link
Contributor

Thanks, that was the missing piece. It works for both modes (KUBERNETES_USE_CONFIGMAPS=true|false). Which mode would be the preferred one? I noticed that it is important to clean up. If you install it and an old endpoint or configmap still exists, the cluster will form correctly, but the service doesn't get the endpoint. Maybe this can be improved. I updated my Helm chart. It now removes both, configmaps and endpoints, in a post-delete hook.

@unguiculus
Copy link
Contributor

I also experimented with parallel pod management and rolling updates. Do you see a problem with this?

unguiculus/charts@dd94767

@CyberDem0n
Copy link
Member Author

Which mode would be the preferred one?

Endpoints. Otherwise there is a race condition: #536

I noticed that it is important to clean up. If you install it and an old endpoint or configmap still exists, the cluster will form correctly, but the service doesn't get the endpoint. Maybe this can be improved.

I don't think that it will "form correctly". Patroni will notice that there is a <cluster-name>-config endpoint or configmap, assume that cluster was already initialized and will not run initdb but try to restore from a backup (with wal-e) or all pods wait until there will be a leader to get basebackup from it.

Database clusters are stateful and usually you don't delete-create them all the time. There is not much we can improve.

I also experimented with parallel pod management and rolling updates. Do you see a problem with this?

I've never played with rolling upgrades so far, but I think it might be dangerous. It is very important to not terminate the next pod until the previous one become healthy enough (started streaming from master and replication lag close to 0).
Actually we are not really using helm charts, but postgres operator for everything: cluster creation and removal, user management, PV management, rolling upgrades and so on.

@unguiculus
Copy link
Contributor

I've seen the operator project and was wondering which way to go. I'm looking for a solution without an additional DCS like Etcd. Is this also possible with the operator?

@CyberDem0n
Copy link
Member Author

Postgres-operator doesn't care what DCS is used by Patroni. It just passes some environment variables to Spilo.

@jberkus
Copy link
Contributor

jberkus commented Dec 5, 2017

So, some testing questions;

  1. Does the main spilo image now support running as kube-native? If so, how do I turn this behavior on? If not, which image do I need to use?

  2. Where do I set KUBERNETES_USE_CONFIGMAP? is that an ENV in the pod definition? If so, it doesn't seem to have an effect.

  3. @ants did you get some version of this working on OpenShift?

@CyberDem0n
Copy link
Member Author

The latest spilo image supporting kube-native is registry.opensource.zalan.do/acid/spilo-10:1.3-p6.
It is build from this branch: zalando/spilo#198, which is not yet merged into master.

In order to enable Kubernetes api for leader election you should set DCS_ENABLE_KUBERNETES_API env. By default it is using Endpoints. It eliminate a lot of problems and race conditions, because subsets for the leader endpoint are set at the same time when leader lock is acquired/updated. If you want to use config maps for leader election, you have to set KUBERNETES_USE_CONFIGMAP env. Basically all "magic" of generating patroni config yaml is hidden here: https://github.com/zalando/spilo/pull/198/files#diff-9ec119124414a288bc2f1edd468dcce5R399 and here: https://github.com/zalando/spilo/pull/198/files#diff-9ec119124414a288bc2f1edd468dcce5R442

@jberkus
Copy link
Contributor

jberkus commented Dec 6, 2017

OK, so KUBERNETES_USE_CONFIGMAP is in spilo but not in upstream patroni?

@CyberDem0n
Copy link
Member Author

Yes, KUBERNETES_USE_CONFIGMAP is in Spilo. Somehow in Spilo it is reverted in comparison to Patroni. Because it Patroni I wanted to make a safe choice.

In Patroni kubernetes configs are different:

  • kubernetes.use_endpoints in yaml or PATRONI_KUBERNETES_USE_ENDPOINTS in env
  • kubernetes.namespace in yaml or PATRONI_KUBERNETES_NAMESCAPE in env -- kubernetes namespace where we are running
  • kubernetes.labels in yaml or PATRONI_KUBERNETES_LABELS in env -- These labels will be used to find existing objects (Endpoints|ConfigMaps + Pods) associated with the current cluster. Also Patroni will set them on every object (Endpoint|ConfigMap) it creates/updates.
  • kubernetes.scope_label in yaml or PATRONI_KUBERNETES_SCOPE_LABEL in env -- name of the label containing cluster name. Default value is cluster-name
  • kubernetes.role_label in yaml or PATRONI_KUBERNETES_ROLE_LABEL in env -- name of the label containing role (master or replica). Patroni will set this label on the pod it runs in. Default value is role
  • kubernetes.pod_ip in yaml or PATRONI_KUBERNETES_POD_IP in env -- ip of the pod we are running in. It's only necessary when use_endpoints is enabled, to write this ip into the leader endpoint subsets.
  • kubernetes.ports in yaml or PATRONI_KUBERNETES_PORTS in env -- if the Service object has the name for the port, the same name must appear in the Endpoint object, otherwise service wont work. Example: {kund: Service, spec: {ports: [{name: postgresql, port: 5432, targetPort: 5432}]}}. In this case you have to define kubernetes.ports: {[{"name": "postgresql", "port": 5432}]} and Patroni will use it for updating subsets of a leader Endpoint. This parameter is not used if use_endpoints is not set.

@jberkus
Copy link
Contributor

jberkus commented Dec 6, 2017

OK, trying use_endpoints in Openshift, will report back.

Clearly I need to write a config doc for this.

@hjacobs
Copy link
Contributor

hjacobs commented Dec 7, 2017

This is AFAIK already running in production, can we merge this PR?

@ants
Copy link
Collaborator

ants commented Dec 7, 2017

@jberkus - yes I got a version of this working with OpenShift. I actually did use heavily modified Spilo because I wanted the archiving and backup features from there. There is plenty of issues to resolve when building the image, for example initdb will fail because getpwnam() doesn't work because the container user does not exist, everything needs to run as the fake root user, anything that will be modified from the container needs g+rw permissions set on it, setuid will not work at all, so cron daemon needs to be replaced.

I used the config map based approach, the race condition is an acceptable risk for now.

I haven't tried to integrate the latest version, but that is just because I have been busy with other tasks.

@alexeyklyukin
Copy link
Contributor

👍

@jberkus
Copy link
Contributor

jberkus commented Dec 8, 2017

@CyberDem0n

To date, this branch has done well in all of my testing. I have yet to hit a specific bug with it.

@ants

See follow-up issue ...

@CyberDem0n
Copy link
Member Author

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants