Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Vault via vault-k8s #846

Merged
merged 62 commits into from
Sep 28, 2021
Merged

Support Vault via vault-k8s #846

merged 62 commits into from
Sep 28, 2021

Conversation

ansd
Copy link
Member

@ansd ansd commented Sep 17, 2021

This closes #822, #823, #824.

Before this PR, the cluster-operator was always creating the credentials of the default (admin) user putting them into a K8s Secret object. TLS is supported by the user providing a K8s Secret object containing certificate and private key.

However, some users want to store credentials outside of K8s in external secret stores such as HashiCorp Vault, AWS Secrets Manager, Google Secrets Manager or Azure Key Vault. Benefits of doing so include:

  • all credentials in a single store
  • encrypt data at rest
  • auditing
  • third-party rotation features

As explained in this KubeCon talk there 4 different approaches in K8s to consume external secrets:

  1. Direct API
  2. Controller to mirrors secrets in K8s
  3. Sidecar + MutatingWebhookConfiguration
  4. Secrets Store CSI Driver

In this PR, we take the 3rd approach (Sidecar + MutatingWebhookConfiguration) integrating with Vault using vault-k8s. If spec.secretBackend.vault is configured in the RabbimqCluster CRD, no K8s Secret will be created for the default user credentials. Instead a Vault init container + sidecar fetch credentials from Vault.

In future, we might (no promises) support approaches 2. Controller to mirrors secrets in K8s as for example requested in #840 and 4. Secrets Store CSI Driver. The latter seems to be the future proof K8s native way of fetching external secrets.

This PR adds support for 3 features:

  1. To create a cluster the default user's credentials should come from Vault's secret #822: When creating a RabbitmqCluster, the default user credentials are fetched from Vault. The assumption is that the credentials were already created in Vault (via some cluster-operator external mechanism). The cluster-operator will not create a K8s Secret for the default user. Instead, a Vault init container will fetch the credentials and mount them to be used in the RabbitMQ container.
  2. Update default user's password in RabbitMQ when it has changed in Vault #823: Password rotation without restarting Pods. When the default user password changes in Vault, the Vault sidecar will update the password in the /etc/rabbitmq/conf.d/11-default_user.conf file. However, RabbitMQ cannot pick up config file changes on the fly. Therefore, some component need to update the password on the RabbitMQ server. The Vault sidecar container does not seem to contain the tools to HTTP PUT to the RabbitMQ Management API or to kubectl exec into the RabbitMQ container. Since we do not want to change the default Vault image, a 2nd side car can optionally be deployed. It contains a single Go binary that watches 11-default_user.conf for changes. If a new password is detected, it HTTP PUT to the RabbitMQ management API updating the password server side and copies the password into /var/lib/rabbitmq/.rabbitmqadmin.conf to be used for the rabbitmqadmin CLI. Username rotation is not supported.
  3. To create a cluster with TLS enabled, the certificate and private key should come from Vault's secret #824: To enable TLS, RabbitMQ server certificates are provided on-demand by Vault PKI Engine instead of being pre-put into a K8s Secret. The RabbitMQ pod uses a short-lived leaf certificate (aligning with Vault's philosophy of short-lived secrets). After each RabbitMQ pod start-up, a new certificate is issued by Vault. The private key is never stored in Vault. Before the short-lived server certificate expires, vault sidecar container will request new certificate putting it into /etc/rabbitmq-tls/ where it will be picked up on-the-fly by the Erlang VM without the need to restart the pod.

No changes are done to the Erlang cookie: it will still be created by the cluster-operator and stored in a K8s Secret.

Tests:

  • Unit tests were added for the new admin-password-updater sidecar container.
  • Unit tests in the cluster-operator
  • End-to-end tests are provided in the docs/examples/vault-default-user and docs/examples/vault-tls directories covering the above 3 features and will be run in our Concourse pipeline. The setup.sh scripts give users a high level idea of what needs to be setup on Vault server side (although the examples use only Vault server dev mode).

Most of the work in this PR was done by @MarcialRosales 🙌

This PR is a draft because the following TODOs are left:

  • Get feedback: @twhite0 would you be interested in testing the changes?
  • Re-test everything.
  • Add docs: specifically READMEs in the docs/examples/vault-*/ directories.
  • Merge squashing all commits.
  • Merge Add Vault docs rabbitmq-website#1272

ChunyiLyu and others added 30 commits September 17, 2021 16:27
since it shouldn't be readable by everyone.

Before this commit:

ls -al /etc/rabbitmq/conf.d/
drwxrwsrwt 2 root     rabbitmq  100 Sep  7 14:52 .
drwxrwxrwx 1 rabbitmq rabbitmq 4096 Sep  7 14:30 ..
-rw-r--r-- 1 root     rabbitmq  604 Sep  7 14:29 10-operatorDefaults.conf
-rw-r--r-- 1 _apt     rabbitmq   45 Sep  7 14:52 11-default_user.conf
-rw-r--r-- 1 root     rabbitmq   51 Sep  7 14:29 90-userDefinedConfiguration.conf

After this commit:

ls -al /etc/rabbitmq/conf.d/
drwxrwsrwt 2 root     rabbitmq  100 Sep  8 07:19 .
drwxrwxrwx 1 rabbitmq rabbitmq 4096 Sep  8 07:19 ..
-rw-r--r-- 1 root     rabbitmq  604 Sep  8 07:18 10-operatorDefaults.conf
-rw-r----- 1 rabbitmq rabbitmq   45 Sep  8 07:19 11-default_user.conf
-rw-r--r-- 1 root     rabbitmq   51 Sep  8 07:18 90-userDefinedConfiguration.conf
because this is needed when inter-node TLS is enabled or when scraping
metrics via TLS.

Although each RabbitMQ pod will request its own certificate, we still
include all pod hostnames into every certificate because the index will
only be known at runtime (not cluster-operator deploy time) and we can't
use K8s downward API here since the labels must be set correctly when
the vault-agent init container runs before all other containers.
Add unit tests for vault annotations

TODO: add unit tests for vault commands and mounts
if default user secret is stored in Vault
Before this commit:
2021-09-09T13:27:27.312Z [INFO] (runner) rendered "(dynamic)" => "/etc/rabbitmq-tls//tls.key"
2021-09-09T13:27:27.313Z [INFO] (runner) rendered "(dynamic)" => "/etc/rabbitmq-tls//tls.crt"
and fix mountpath
since it serves as kind of docs for our uses to give a high level idea
of what needs to be set up on Vault side.

Rename some fields to be more succinct.
Some K8s clusters (e.g. kind) require to set the issuer.

Do not set the issuer on GKE.
@ansd
Copy link
Member Author

ansd commented Sep 21, 2021

@twhite0 thank you for your followups, these are excellent questions.

What was the driver for adding in the SecurityContext work in the Statefulset.

We set a PodSecurityContext to run as user 999 and FSGroup 999:

			SecurityContext: &corev1.PodSecurityContext{
				FSGroup:   &rabbitmqUID,
				RunAsUser: &rabbitmqUID,
			},

We just removed the other container SecurityContext via 6b529d9 since they are indeed not needed because they were only setting RunAsUser: &rabbitmqUID which was already set in the PodSecurityContext.

The driver to set the PodSecurityContext is to run processes as RabbitMQ user (as opposed to running as root user).
FSGroup ensures that mounted volumes belong to the RabbitMQ group.

For OpenShift, this is not needed since the runtime will assign arbitrary user IDs to the containers as documented in https://www.rabbitmq.com/kubernetes/operator/using-on-openshift.html#arbitrary-user-ids.

We found a couple more annotations which we needed to add into the Statefulset override section of our RabbitmqCluster definition to get things cooking.

You can now set arbitrary Vault annotations as documented in

# Optionally, set Vault annotations as listed in
# https://www.vaultproject.io/docs/platform/k8s/injector/annotations
annotations:
vault.hashicorp.com/template-static-secret-render-interval: "15s"
User set Vault annotations will override cluster-operator-set Vault annotations.

@twhite0
Copy link

twhite0 commented Sep 21, 2021

@ansd: Thanks for the response and quick pivots as a result of my questions. We were able to pull the latest changes and see great results.

  • On OpenShift 4.6.26
  • Seeded our Vault environment with admin credentials + PKI backend and configured our vault injector.
  • Deployed a RMQ cluster pulling in both admin credentials and certs. ✔️
  • Deployed a RMQ cluster with the password rotation sidecar enabled. ✔️
  • We're still needing to work through how to automate the admin credential rotation process.
  • We're working through how to automate the Vault provisioning per cluster.

Let us know if there's anything else you'd like us to test or comment on.

@ansd
Copy link
Member Author

ansd commented Sep 22, 2021

Thanks a lot @twhite0 for your feedback. That's perfect!

@MarcialRosales had some good refactoring suggestions yesterday, he will change Vault field names slightly. Thereafter we should be ready to merge.

Vault attributes renaming:
PathDefaultUser -> DefaultUserPath
PathCertificate -> PKIIssuerPath

Password rotation enabled by default:
By default, the cluster operator deploys a sidecar container with a
default image name
It is possible to override the image by setting `DefaultUserUpdaterImage`

Removed SecretBackend `CredentialUpdaterImage` in favor of having a dedicated
image to control how to rotate passwords when Vault is enabled
api/v1beta1/rabbitmqcluster_types.go Outdated Show resolved Hide resolved
config/crd/bases/rabbitmq.com_rabbitmqclusters.yaml Outdated Show resolved Hide resolved
docs/api/rabbitmq.com.ref.asciidoc Outdated Show resolved Hide resolved
docs/examples/vault-default-user/rabbitmq.yaml Outdated Show resolved Hide resolved
internal/resource/statefulset_test.go Outdated Show resolved Hide resolved
internal/resource/statefulset_test.go Show resolved Hide resolved
@ansd ansd marked this pull request as ready for review September 23, 2021 12:27
@twhite0
Copy link

twhite0 commented Sep 23, 2021

@ansd:

We noticed the status is defaulting to supplying secretReference information.

credit: @ssheth1

…er resource

if default user credentials come from Vault
MarcialRosales and others added 4 commits September 24, 2021 17:15
to rabbitmq/default-user-credential-updater

to allow for easier and independant versioning from cluster-operator.

We do not want to have the admin-password-updater image the same version
as the cluster-operator image.
so that new repo name matches image name, container name, and
entrypoint.
@ansd
Copy link
Member Author

ansd commented Sep 28, 2021

@ssheth1 @twhite0 very good spot, thank you :) This is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

To create a cluster the default user's credentials should come from Vault's secret
5 participants