Skip to content

Commit

Permalink
Backport #10801 to branch/v9 (#11964)
Browse files Browse the repository at this point in the history
* Edit three guides to support Cloud users

See #10633

Upgrading

- Misc. grammar/style/clarity tweaks
- Add details re: checking the Cloud Proxy/Auth versions for Cloud
  users
- Add a scoped Tabs component for the upgrade sequence
- Remove the "upgrading to Teleport 4.0+" section since we no longer
  support this version

Backup and restore

- Add scoped Tabs components where instructions vary between editions
- Misc clarity/grammar/style improvements

Authentication

Use Tabs to ensure that readers who have selected one scope don't see
content that is relevant only for other scopes.

* Respond to PR feedback

* Respond to PR feedback
  • Loading branch information
ptgott authored Apr 15, 2022
1 parent 5cd6a60 commit e286372
Show file tree
Hide file tree
Showing 3 changed files with 364 additions and 129 deletions.
192 changes: 159 additions & 33 deletions docs/pages/setup/operations/backup-restore.mdx
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@
---
title: Backup and Restore
description: How to backup and restore Teleport cluster state.
description: How to back up and restore your Teleport cluster state.
---
This guide explains the components of your Teleport deployment that must be
backed up and lays out our recommended approach for performing backups.

When planning a backup of Teleport, it's important to know what is where and the
importance of each component. Teleport's Proxies and Nodes are stateless, and thus
only `teleport.yaml` should be backed up.
## What you should back up

The Auth server is Teleport's brains, and depending on the backend should be backed up
regularly.
### Teleport services
<Tabs>
<TabItem scope={["enterprise", "oss"]} label="Self-Hosted">

For example, a customer running Teleport on AWS with DynamoDB have these key items of data:
Teleport's Proxy Service and Nodes are stateless. For these components, only
`teleport.yaml` should be backed up.

The Auth Service is Teleport's brain, and depending on the backend should be
backed up regularly.

For example, a Teleport cluster running on AWS with DynamoDB must back up the
following data:

| What | Where ( Example AWS Customer ) |
| - | - |
Expand All @@ -22,35 +30,68 @@ For example, a customer running Teleport on AWS with DynamoDB have these key ite
| teleport.yaml | File System |
| teleport.service | File System |
| license.pem | File System |
| TLS key/certificate | ( File System / Outside Scope ) |
| TLS key/certificate | File System / AWS Certificate Manager |
| Audit log | DynamoDB |
| Session recordings | S3 |

For this customer, we would recommend using [AWS best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BackupRestore.html) for backing up DynamoDB. If DynamoDB is used for
the audit log, logged events have a TTL of 1 year.
Teleport audit logs, logged events have a TTL of 1 year.

| Backend | Recommended backup strategy |
| - | - |
| dir ( local filesystem ) | Backup `/var/lib/teleport/storage` directory and the output of `tctl get all --with-secrets`. |
| DynamoDB | [Follow AWS Guidelines for Backup & Restore](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BackupRestore.html) |
| etcd | [Follow etcD Guidleines for Disaster Recovery](https://etcd.io/docs/v2/admin_guide) |
| Firestore | [Follow GCP Guidlines for Automated Backups](https://firebase.google.com/docs/database/backups) |
| Local Filesystem | Back up the `/var/lib/teleport/storage` directory and the output of `tctl get all --with-secrets`. |
| DynamoDB | [Follow AWS's guidelines for backup and restore](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BackupRestore.html) |
| etcd | [Follow etcd's guidelines for disaster recovery](https://etcd.io/docs/v2/admin_guide) |
| Firestore | [Follow GCP's guidelines for automated backups](https://firebase.google.com/docs/database/backups) |

</TabItem>
<TabItem scope={["cloud"]} label="Teleport Cloud">

Teleport Cloud manages all Auth Service and Proxy Service backups.

While Teleport Nodes are stateless, you should ensure that you can restore their
configuration files.

</TabItem>
</Tabs>

### Teleport resources

Teleport uses YAML resources for roles, trusted clusters, local users, and auth connectors.
These could be created via `tctl` or the UI.
Teleport uses YAML resources for roles, Trusted Clusters, local users, and authentication connectors.
These could be created via `tctl` or the Web UI.

You should back up your dynamic resource configurations to ensure that you can restore them in case of an outage.

## GitOps
## Our recommended backup practice

If you're running Teleport at scale, your teams need to have an automated way to restore Teleport. At a high level, this is our recommended approach:

- Persist and backup your backend
- Share that backend among auth servers
- Store your configs as discrete files in VCS
- Have your CI run `tctl create -f *.yaml` from that git directory
<Tabs>
<TabItem scope={["enterprise", "oss"]} label="Self-Hosted">

- Persist and back up your backend.
- Share that backend among Auth Service instances.
- Store your dynamic resource configurations as discrete files in a git
repository.
- Have your continuous integration system run `tctl create -f *.yaml` from the
git repository. The `-f` flag instructs `tctl create` not to return an error
if a resource already exists, so this command can be run regularly.

</TabItem>
<TabItem scope={["cloud"]} label="Teleport Cloud">

- Store your dynamic resource configurations as discrete files in a git
repository.
- Have your continuous integration system run `tctl create -f *.yaml` from the
git repository. The `-f` flag instructs `tctl create` not to return an error
if a resource already exists, so this command can be run regularly.

</TabItem>
</Tabs>

## Migrating backends
<Tabs>
<TabItem scope={["enterprise"]} label="Teleport Enterprise">

As of version v4.1, you can now quickly export a collection of resources from
Teleport. This feature was designed to help customers migrate from local storage
Expand All @@ -63,13 +104,74 @@ Using `tctl get all --with-secrets` will retrieve the below items:
- Trusted Clusters
- Connectors:
- Github
- SAML \[Teleport Enterprise]
- OIDC \[Teleport Enterprise]
- Roles \[Teleport Enterprise]
- SAML
- OIDC
- Roles

When migrating backends, you should back up your Auth Service's
`data_dir/storage` directly.

### Example of backing up and restoring a cluster

```code
# Export dynamic configuration state from old cluster
$ tctl get all --with-secrets > state.yaml
# Prepare a new uninitialized backend (make sure to port
# any non-default config values from the old config file)
$ mkdir fresh && cat > fresh.yaml << EOF
teleport:
data_dir: fresh
EOF
# bootstrap fresh server (kill the old one first!)
$ sudo teleport start --config fresh.yaml --bootstrap state.yaml
# from another terminal, verify state transferred correctly
$ tctl --config fresh.yaml get all
# <your state here>
```

The `--bootstrap` flag has no effect, except when the Auth Service initializes
its backend initialization on first startup, so it is safe for use in
supervised/High Availability contexts.

### Limitations

The `--bootstrap` flag doesn't re-trigger Trusted Cluster handshakes, so Trusted
Cluster resources need to be recreated manually.

All the same limitations around modifying the config file of an existing cluster
also apply to a new cluster being bootstrapped from the state of an old cluster:

- Changing the cluster name will break your CAs. This will be caught and Teleport
will refuse to start.
- Some user authentication mechanisms (e.g. WebAuthn and U2F) require that the
public endpoint of the Web UI remains the same. This cannot be caught by
Teleport, so be careful!
- Any Node whose invite token is defined in the Auth Service's configuration
file will be able to join automatically, but Nodes that were added
dynamically will need to be re-invited.

</TabItem>
<TabItem scope={["oss"]} label="Open Source">

As of version v4.1, you can now quickly export a collection of resources from
Teleport. This feature was designed to help customers migrate from local storage
to etcd.

Using `tctl get all --with-secrets` will retrieve the below items:

When migrating backends, you should back up your auth server's `data_dir/storage` directly.
- Users
- Certificate Authorities
- Trusted Clusters
- GitHub Connectors
- Roles

**Example of backing up and restoring a cluster.**
When migrating backends, you should back up your Auth Service's
`data_dir/storage` directly.

### Example of backing up and restoring a cluster

```code
# Export dynamic configuration state from old cluster
Expand All @@ -90,13 +192,37 @@ $ tctl --config fresh.yaml get all
# <your state here>
```

The `--bootstrap` flag has no effect, except during backend initialization (performed
by auth server on first start), so it is safe for use in supervised/High Availability contexts.
The `--bootstrap` flag has no effect, except when the Auth Service initializes
its backend initialization on first startup, so it is safe for use in
supervised/High Availability contexts.

### Limitations

The `--bootstrap` flag doesn't re-trigger Trusted Cluster handshakes, so Trusted
Cluster resources need to be recreated manually.

All the same limitations around modifying the config file of an existing cluster
also apply to a new cluster being bootstrapped from the state of an old cluster:

- Changing the cluster name will break your CAs. This will be caught and Teleport
will refuse to start.
- Some user authentication mechanisms (e.g. WebAuthn and U2F) require that the
public endpoint of the Web UI remains the same. This cannot be caught by
Teleport, so be careful!
- Any Node whose invite token is defined in the Auth Service's configuration
file will be able to join automatically, but Nodes that were added
dynamically will need to be re-invited.

</TabItem>
<TabItem scope={["cloud"]} label="Teleport Cloud">

In Teleport Cloud, backend data is managed for you automatically. If you would
like to migrate configuration resources to a self-hosted Teleport cluster,
follow our recommended backup practice of storing configuration resources in a
git repository and running `tctl create -f` regularly for each resource. This
will enable you to keep your configuration resources up to date regardless of
storage backend.

**Limitations**
</TabItem>
</Tabs>

- The `--bootstrap` flag doesn't re-trigger trusted cluster handshakes, so trusted cluster resources need to be recreated manually.
- All the same limitations around modifying the config file of an existing cluster also apply to a new cluster being bootstrapped from the state of an old cluster. Of particular note:
- Changing cluster name will break your CAs (this will be caught and teleport will refuse to start).
- Some user authentication mechanisms (e.g. WebAuthn and U2F) require that the public endpoint of the web ui remains the same (this can't be caught by teleport, be careful!).
- Any node whose invite token is defined statically (in the config file of the auth server) will be able to join automatically, but nodes that were added dynamically will need to be re-invited
101 changes: 49 additions & 52 deletions docs/pages/setup/operations/upgrading.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,87 +5,84 @@ description: How to upgrade Teleport components

## Production releases

<Admonition type="warning">
<Notice type="warning">
Avoid running pre-releases (release candidates) in production environments.
</Admonition>
</Notice>

The Teleport development team uses [Semantic Versioning](https://semver.org/) which makes it easy to tell if a specific
version is recommended for production use.
The Teleport development team uses [Semantic Versioning](https://semver.org/),
which makes it easy to tell if a specific version is recommended for production
use.

## Component compatibility

When running multiple binaries of Teleport within a cluster (nodes, proxies,
clients, etc), the following rules apply:
<Details
scope={["cloud"]}
scopeOnly
opened
title="Auth Service and Proxy Service versions">

In Teleport Cloud, we manage the Auth and Proxy Services for you. You can
determine the current version of these services by running the following
command, where `mytenant` is the name of your Teleport Cloud tenant:

```code
$ curl -s https://mytenant.teleport.sh/webapi/ping | jq '.server_version'
```

**Before 5.0.0**
Read the following rules to ensure that your Teleport Nodes are compatible with
the Teleport Auth and Proxy Services. You should check the version of the Auth
and Proxy Services regularly to make sure that your Teleport Nodes are
compatible.

- **Only patch** versions are always compatible, for example, any 4.0.1 component will work with any 4.0.3 component.
- Minor versions are always compatible with the **previous** minor release. This means you must not attempt to upgrade from 4.1.x straight to 4.3.x. You must upgrade to 4.2.x first.
- Teleport clients [`tsh`](../reference/cli.mdx#tsh) for users and [`tctl`](../reference/cli.mdx#tctl) for admins
may not be compatible with different versions of the `teleport` service.
</Details>

**After 5.0.0**
When running multiple binaries of Teleport within a cluster, the following rules apply:

- **Patch and minor** versions are always compatible, for example, any 5.0.1 component will work with any 5.0.3 component and 6.1.0 component will work with any 6.7.0 component.
- Major versions are always compatible with the **previous** major release. This means you must not attempt to upgrade from 5.x.x straight to 7.x.x. You must upgrade to 6.x.x first.
- The above applies to both clients and servers. For example, a 6.x.x proxy is
compatible with 5.x.x nodes and 5.x.x `tsh`. But we don't guarantee that a
7.x.x `tsh` will work with a 5.x.x proxy.
- The above applies to both clients and servers. For example, a 6.x.x Proxy Service is
compatible with 5.x.x Nodes and 5.x.x `tsh`. But we don't guarantee that a
7.x.x `tsh` will work with a 5.x.x Proxy Service.

## Backup

Backup before upgrading. We have more instructions in [Backing up Teleport](./backup-restore.mdx).
Back up before upgrading. We have more instructions in [Backing up Teleport](./backup-restore.mdx).

## Upgrade Sequence

<Tabs>
<TabItem scope={["enterprise", "oss"]} label="Self-Hosted">
When upgrading a single Teleport cluster:

1. **Upgrade the auth server first**. The auth server keeps the cluster state and if there are data format changes introduced in the new version this will perform necessary migrations.
2. Then, upgrade the proxy servers. The proxy servers are stateless and can be upgraded in any sequence or at the same time.
3. Finally, upgrade the SSH nodes in any sequence or at the same time.
1. **Upgrade the Auth Service first**. The Auth Service keeps the cluster state and, if there are data format changes introduced in the new version, will perform necessary migrations.
2. Upgrade Proxy Service instances. These are stateless and can be upgraded in any sequence or at the same time.
3. Finally, upgrade your Teleport Nodes in any sequence or at the same time.

<Admonition
type="warning"
title="Warning"
>
If several auth servers are running in High Availability configuration
(for example, in AWS auto-scaling group) you have to shrink the group to
**just one auth server** before performing an upgrade. While Teleport will attempt to perform any necessary migrations, we recommend users create a backup of their backend before upgrading the Auth Server, as a
If several Auth Service instances are running in the High Availability configuration
(for example, in an AWS Auto Scaling group), you must shrink the group to
**just one Auth Service** before performing an upgrade.

While Teleport will attempt to perform any necessary migrations, we recommend users create a backup of their backend before upgrading the Auth Server as a
precaution. This allows for a safe rollback in case the migration itself fails.
</Admonition>

When upgrading multiple clusters:

1. First, upgrade the main cluster, i.e. the one which other clusters trust.
2. Upgrade the trusted clusters.
1. First, upgrade the root cluster, i.e. the one that other clusters trust.
2. Upgrade the Trusted Clusters.
</TabItem>
<TabItem scope={["cloud"]} label="Teleport Cloud">

## Upgrading to Teleport 4.0+

Teleport 4.0+ switched to GRPC and HTTP/2 as an API protocol. The HTTP/2 spec bans
two previously recommended ciphers. `tls-rsa-with-aes-128-gcm-sha256` & `tls-rsa-with-aes-256-gcm-sha384`, make sure these are removed from `teleport.yaml`

Whenever Teleport is using those cipher suites, it will experience connectivity issues between components starting 4.0+.

```yaml
# remove those two ciphersuites from the
ciphersuites:
- tls-rsa-with-aes-128-gcm-sha256
- tls-rsa-with-aes-256-gcm-sha384
```
The Teleport Auth Service and Proxy Service are upgraded automatically. When
upgrading Nodes, you may upgrade in any sequence or at the same time.

When upgrading multiple clusters:

Rotate CA to `SHA-256` or `SHA-512` for RSA SSH certificate signatures.

The previous default was `SHA-1`, which is now considered weak against brute-force attacks.
`SHA-1` certificate signatures are also no longer accepted by `OpenSSH` versions `8.2` and above.

All new Teleport clusters will default to `SHA-512` based signatures.

To upgrade an existing cluster, set the following in auth server's `teleport.yaml`:

```yaml
teleport:
ca_signature_algo: "rsa-sha2-512"
```

Finally, rotate the cluster CA [following these docs](./ca-rotation.mdx).
1. First, upgrade the root cluster, i.e. the one that other clusters trust.
2. Upgrade the Trusted Clusters.
</TabItem>
</Tabs>
Loading

0 comments on commit e286372

Please sign in to comment.