Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenShift #303

Merged
merged 1 commit into from
Sep 9, 2020
Merged

Support OpenShift #303

merged 1 commit into from
Sep 9, 2020

Conversation

ansd
Copy link
Member

@ansd ansd commented Sep 1, 2020

Fixes #234

Changes:

  1. Allow rabbitmq-cluster-operator-role to update rabbitmqclusters/finalizers
  2. Do not BlockOwnerDeletion for PVCs (as done in Set BlockOwnerDeletion to false on PVCs elastic/cloud-on-k8s#1891)
  3. Change group owner of /var/lib/rabbitmq/mnesia/ to 999

Docs:

https://github.com/ansd/rabbitmq-website/tree/k8s-openshift (need to create PR)

Testing on OpenShift:

If you don't have an OpenShift cluster available, the easiest way to set one up within a few minutes is CodeReady Containers.

  • Download CodeReady Containers
  • crc setup
  • crc config set memory 12000 (the default of 8192 MiB is too low to run the RabbitMQ operator and a RabbitMQ cluster instance since there isn't much memory left when starting crc)
  • crc start
  • eval $(crc oc-env)
  • oc login -u kubeadmin -p <password> https://api.crc.testing:6443
  • make deploy-dev
  • Operator won’t deploy because
$ kubectl describe replicasets.apps
...
Events:
  Type     Reason        Age                From                   Message
  ----     ------        ----               ----                   -------
  Warning  FailedCreate  9s (x13 over 29s)  replicaset-controller  Error creating: pods "rabbitmq-cluster-rabbitmq-cluster-operator-6bb8fd7bf8-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{1000}: 1000 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000610000, 1000619999]]
  • As explained here: oc edit namespace rabbitmq-system Change the uid-range and supplemental-groups for the operator:
apiVersion: v1
kind: Namespace
metadata:
  annotations:
...
    openshift.io/sa.scc.supplemental-groups: 1000/1
    openshift.io/sa.scc.uid-range: 1000/1
  • Operator deploys successfully
  • Change the uid-range and supplemental-groups for RabbitMQ (here we assume that the RabbitMQ cluster gets deployed into the default namespace):
    oc edit namespace default
apiVersion: v1
kind: Namespace
metadata:
  annotations:
...
    openshift.io/sa.scc.supplemental-groups: 999/1
    openshift.io/sa.scc.uid-range: 999/1
  • Create a RabbitMQ cluster instance kubectl rabbitmq create test in the default namespace.
  • Before this PR service creation fails because
kubectl describe rabbitmqclusters.rabbitmq.com
...
Status:
  Conditions:
    Message:               services "test-rabbitmq-headless" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>
    Reason:                Error
    Status:                False
    Type:                  ReconcileSuccess

Fixed by change 1 above.

  • Before this PR pod creation fails because
kubectl describe statefulsets.apps
...
Events:
  Type     Reason        Age                 From                    Message
  ----     ------        ----                ----                    -------
  Warning  FailedCreate  72s (x12 over 83s)  statefulset-controller  create Pod test-rabbitmq-server-0 in StatefulSet test-rabbitmq-server failed error: failed to create PVC persistence-test-rabbitmq-server-0: persistentvolumeclaims "persistence-test-rabbitmq-server-0" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>

Fixed by change 2 above.

  • Before this PR pod fails because
kubectl logs test-rabbitmq-server-0
10:05:14.190 [warning] Failed to write PID file "/var/lib/rabbitmq/mnesia/[email protected]": permission denied

Fixed by change 3 above.

  • RabbitMQ instance deploys successfully

I tested the above steps with the changes on this branch against

crc version
CodeReady Containers version: 1.15.0+e317bed
OpenShift version: 4.5.7 (embedded in binary)

@ChunyiLyu
Copy link
Contributor

@ansd Two things that we've charted about:

  1. Does BlockOwnerDeletion set to false means PVCs do not get garbage collected? If you have tested the behavior and that's not true, could you note that in the PR?

  2. I am not sure about the value of an additional initContainer just for changing the group owner of mnesia. InitContainers are created sequentially. If creating an alpine container takes 5 seconds, for a 5 nodes cluster that means start and restart time of the cluster would be 25 seconds longer. I don't think it's worth it for the security concerns that we've talked about in standup.

@ansd
Copy link
Member Author

ansd commented Sep 8, 2020

@ChunyiLyu

  1. PVCs get garbage collected although BlockOwnerDeletion is set to false.
  2. The chgrp is now part of the 1st init container which runs as root user dropping all capabilities that are not needed.

Copy link
Contributor

@ChunyiLyu ChunyiLyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I suggest squashing commits when merging since the commits doesn't add individual features.

One small thing and it's up to you. The initContainer is called 'copy-config', and arguably it is doing way more and whatever it's necessary to set up the rabbitmq-server container. We can rename it.

1. Allow rabbitmq-cluster-operator-role to update
rabbitmqclusters/finalizers

2. Do not BlockOwnerDeletion for PVCs

3. Change group owner of mnesia dir to 999
Otherwise, the RabbitMQ process can't write the pid file into the
/var/lib/rabbitmq/mnesia/ directory on OpenShift due to permissions
denied.
Before this commit, mnesia dir was owned by user root and group root.
On OpenShift, mnesia does not have rwx bits for everyone due to stricter
security constraints:
drwxrwx---. 2 root     root       6 Aug 20 10:03 mnesia

Fixes #234
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot create RMQ instance on Openshift
4 participants