Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth design #87

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions AUTH-DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Authentication, authorization and secure communication

* Status: proposed
* Date: 2024-04-03

Guthub issue: https://github.com/aenix-io/etcd-operator/issues/76

## Security specification

```
kind: EtcdCluster
spec:
...
security:
peer:
Copy link
Collaborator Author

@Kirill-Garbar Kirill-Garbar Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As no separate enabler is proposed, we will have only 2 states: secrets provided by customer and auto-tls options from etcd.

Will we implement creation of the secrets in the operator with cert-manager objects? If yes, then we need to find a solution how to enable/disable it. Proposal: create in the future additional sections ~operatorManagedCertificates or so. Defined new section and defined secret references can't exist together and will be validated in a webhook.

If we agree that this is good option, then we will not extend specification for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we implement creation of the secrets in the operator with cert-manager objects?

The certificate management offered by CloudnativePG seems very good to me.

Essentially, if no values are provided, certs are managed by the Operator. With a user input, instead, the Operator does a look-up to use them.

Copy link
Member

@kvaps kvaps Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, I think we have to enable certs generation if nothing specified.
I do not see the case where user would need etcd-cluster without certificates.

Copy link
Member

@kvaps kvaps Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's enable certificates generation by operator by default for now, and extend API spec for specifying custom user certificates later.

What do you think @serathius?

Copy link
Collaborator Author

@Kirill-Garbar Kirill-Garbar Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, for many users (not k8s control plane) auto-tls for transport encryption and password authentication is enough.

Self-managed certificates are for the future. I would say it is mature logic and it is a bit early to design this rotation before operator is able to operate (fully and reliably) etcd (scale, defrag, backup, restore). I thought the same in the very beginning, that we need to remove cert-manager dependency as soon as possible, but after that I though one more time:

  • Cert-manager is used by (I assume) every mature platform.
  • Even if it is not used => easy to install.
  • The only problem - Openshift where they have their own PKI infra. This problem should be investigated. Even there certman can be installed.

I assumed that operator will create cert-manager objects in the code and will depend on the installed cert-manager operator with its CRDs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would base this on the fact that the primary users for now are hosted Kubernetes control planes. We're ready to adopt the operator as soon as it offers a stable spec and a feature for managing x509 certificates.

I have nothing against using cert-manager. I was just considering that implementing basic logic for generating certificates (even for 10 years) might be simpler than agreeing on and stabilizing the spec for external certificate management at this phase.

caSecretName: peer-ca-tls-secret
tlsSecretName: peer-server-tls-secret
clientServer:
caSecretName: client-server-ca-tls-secret
tlsSecretName: client-server-server-tls-secret
auth:
tlsSecretName: client-server-client-tls-secret
...
```

It is expected that secrets contain sections with specific names: `tls.crt`, `tls.key` for tlsSecret and `ca.crt` for caSecret.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secret sections (filenames) will be hardcoded to decrease configuration complexity


All fields are optional, but if one field is defined in a pair (caSecretName and tlsSecretName), other must be defined as well - it will be validated in a webhook.

## Peer communication
If peer secrets are not defined, then `--peer-auto-tls` option is used that allows etcd to communicate via https.

If peer certificate/key is reissued, etcd cluster does rollout restart to reread the secret. Operator watches these secrets.

One secret is used for all etcd nodes.

## Client-server communication
If client-server secrets are not defined, then `--auto-tls` option is used that allows clients to communicate via https.

If client-server certificate/key is reissued, etcd cluster does rollout restart to reread the secret. Operator watches these secrets.

## User authentication
If enabler is true, user authentication is enabled and `root` user is created in etcd without a password. It is expected that customer provides valid secret for operator authentication (to operate etcd cluster) with `tls.crt` and `tls.key` sections. As multiple secrets for multiple etcd clusters are created on the fly, secrets are not mounted to operator => secrets are read on the fly and reread by operator if certificates are reissued.

If `auth.tlsSecretName` is defined, then the whole `clientServer` section must be defined as well => validated in a webhook.

## Futher improvements to be described and discussed

1. * What: Use separate controller (CR) to create k8s secrets with certificates/passwords and renew them relularly.
* Why:
* Etcd clients (apps deployed to k8s) will need to have possibility to access created etcd clusters. It would be inconvenient to couple user lists in EtcdCluster CR (with complete RBAC lists) with users in the application configurations.
2. * What: Remove cert-manager dependency to create and rotate certificates.
* Why:
* Openshift has its own ecosystem and doesn't have cert-manager out of the box. It has own operator.
* Cert-manager dependency (ceparate operator) is too heavy for etcd-operator.