Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial pass at stories for token registration #69

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions enhancements/sig-architecture/68-token-registration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
## Release Signoff Checklist
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add https://github.com/open-cluster-management-io/api/blob/main/docs/clusterjoinprocess.md as a see-also. It may be worth it to move the doc into this repo.

That doc outlines the current process and a few key features of the flow are visible there:

  1. the subject has a group associated with the managedcluster and the name has an agent associated with the process on that managedcluster
  2. an agent on the managedcluster is able to get new agent credentials by using shared (non-agent) credentials to create a request and the hub admin can approve or reject.
  3. an agent on the managedcluster is able renew credentials to identify itself. This also allows for non-shared credential flows.
  4. different agents on the managed cluster are members of the same group and have different names, with the agent clearly identified
  5. the hub admin gets a choice about whether or not to allow a particular managecluster to get a valid credential
  6. the hub admin can control expiry/removal of a particular agent. In this case, by rotating the signing credential. (Different distros are better or worse at doing this.)

Any new flow that is developed should allow the same capabilities and should use the standard group and usernames to ease adoption. Capability 2 may be negotiable (not required), but it would be limiting because the hub would need to communicate unique credentials to each managed cluster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Added to implementation details section


- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [website](https://github.com/open-cluster-management-io/open-cluster-management-io.github.io/)

## Summary

OCM's use of a CSR based mechanism for registering spoke clusters with the hub cluster is incompatible with Kubernetes environments that cannot issue client auth certificates such as Amazon Elastic Kubernetes Service (EKS).
This enhancement provides a secondary Service Account Token based registration mechanism that is universally supported, and can be used as an alternative to CSR in such environments.

## Motivation

For cluster administrators, it is often preferable to leverage their cloud provider’s managed Kubernetes service (e.g. AKS, EKS, IKS, GKE etc) rather than self-managing the Kubernetes control plane and worker nodes - reducing cluster management overhead and complexity.
However, in some of these environments (for example EKS), the use of CSRs for client authentication is not permissible.

Adopting OCM should not require its users to change how they deploy and manage their hub Kubernetes cluster.

As such, OCM needs to support an alternative registration mechanism to CSRs that is expected to work in all environments, providing maximum compatibility with managed Kubernetes services or restricted/locked down environments.

### Goals

- The OCM hub cluster can run on EKS and other environments where CSR registration cannot be used.
- `clusteradm` tooling provides options to select token based registration when joining a worker to the hub.
- Suitable warnings/error messages/conditions are added to OCM such that it is easily identifiable when CSRs cannot be used.

### Non-Goals

- Support for leveraging native cloud specific IAM providers or external credential providers such as [Vault](https://www.vaultproject.io/) in the registration process (which will be raised as a separate enhancement)

## Proposal

### User Stories

#### Story 1 - Spoke administrator joins a cluster to the hub using Service Account Token

It must be possible for the cluster administrator to specify they wish to use `token` registration in the `clusteradm join` command:

```
% clusteradm join \
--registration=token \
--hub-token XXX \
--hub-apiserver https://spoke-0.k8s.example.com \
--cluster-name spoke-0
```

#### Story 2 - Spoke administrator joins a cluster to the hub using the default (csr) mechanism

When not specified, we should continue to default to the `csr` registration mechanism.

```
% clusteradm join \
--hub-token XXX \
--hub-apiserver https://spoke-0.k8s.example.com \
--cluster-name spoke-0 # default to csr registration
```

This can also be explicitly set to `csr`, using `--registration=csr` :

```
% clusteradm join \
--registration=csr \
--hub-token XXX \
--hub-apiserver https://spoke-0.k8s.example.com \
--cluster-name spoke-0
```

#### Story 3 - Hub administrator accepts a spoke cluster using csr or token registration in the same way

From a hub administrator point of viw, the existing `clusteradm accept` command will continue to work, regardless of whether the spoke cluster is using csr or token registration.

```
% clusteradm accept --clusters spoke-0 # No additional options required if spoke-0 used token registration
```

#### Story 3 - Spoke Service Account Tokens are refreshed automatically prior to expiry

When the service account token is nearing expiry, the spoke cluster should retrieve a replacement token from the hub, without administrator intervention.

#### Story 4 - Hub administrator can accept both csr and token registrations on a single hub cluster

In order to future-proof OCM for additional registration types (e.g. cloud provider IAM), it must be possible for a hub to support spoke clusters using both csr and token registration.

#### Story 5 - Hub administrator unjoining a spoke cluster, results in the associated cluster service account being deleted

When a spoke cluster is unjoined, it must no longer be possible to authenticate with the hub using the spoke's Service Account token

#### Story 6 - Procedure for administrators to follow should a token expire or be deleted and the spoke cluster was unable to refresh the token

There needs to be a procedure to follow covering scenarios where the spoke cluster is unable to refresh its service account token, and has lost its ability to authenticate with the hub.
It should be possible for administrators to restore functionality to the spoke cluster.

Some example scenarios:

1. OCM agents in the spoke cluster were offline (e.g. due to an outage) during which its service account token expired
2. A network outage resulted in the spoke cluster loosing connectivity to the hub api server for an extended period of time
3. A service account was intentionally deleted (e.g. the associated token was compromised) and replaced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have another story about addons. I think in token base approach, addons also need to use token to talk to the hub.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 👍

### Implementation Details/Notes/Constraints [optional]

TODO

### Risks and Mitigation

#### Protecting the spoke cluster's service account token

TODO

## Design Details

### Open Questions [optional]

This is where to call out areas of the design that require closure before deciding
to implement the design. For instance,
> 1. This requires exposing previously private resources which contain sensitive
information. Can we do this?

### Test Plan

**Note:** *Section not required until targeted at a release.*

TODO

### Graduation Criteria

**Note:** *Section not required until targeted at a release.*

TODO

### Upgrade / Downgrade Strategy

TODO

### Version Skew Strategy

TODO

## Implementation History

## Drawbacks

## Alternatives

- Cloud provider native IAM support - this will be covered in a new enhancement.
- CSR remains the preferred approach to spoke cluster authentication with the hub, where usable.

## Infrastructure Needed [optional]

No specific infrastructure required.
13 changes: 13 additions & 0 deletions enhancements/sig-architecture/68-token-registration/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
title: token-registration
authors:
- "@dgorst"
reviewers:
- TBD
approvers:
- TBD
creation-date: 2022-07-04
last-updated: 2022-07-06
status: provisional
see-also: []
replaces: []
superseded-by: []