Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Commit

Permalink
add architecture/reapair/decommission docs (#154)
Browse files Browse the repository at this point in the history
* add architecture docs

Co-authored-by: Marcin Owsiany <[email protected]>
  • Loading branch information
zmalik and porridge authored Jun 24, 2020
1 parent e05066d commit 2af0c7b
Show file tree
Hide file tree
Showing 7 changed files with 150 additions and 13 deletions.
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The KUDO Cassandra Operator makes it easy to deploy and manage
## Getting started

The KUDO Cassandra operator requires [KUDO](https://kudo.dev/) and Kubernetes
versions as specified in [`operator.yaml`](operator/operator.yaml#L4-L5).
versions as specified in [`operator.yaml`](./operator/operator.yaml#L4-L5).

To install it run

Expand All @@ -28,6 +28,11 @@ kubectl kudo install cassandra
- Readiness probe
- Unpriviledged container execution
- Node-to-Node and Node-to-Client communication encryption
- Backup/restore
- Rack-awareness
- Node replace
- Inter-pod anti-affinity
- Multi-datacenter support

## Roadmap

Expand All @@ -36,17 +41,20 @@ kubectl kudo install cassandra

## Documentation

- [Installing](/docs/installing.md)
- [Production](/docs/production.md)
- [Accessing](/docs/accessing.md)
- [Managing](/docs/managing.md)
- [Resources](/docs/resources.md)
- [Upgrading](/docs/upgrading.md)
- [Monitoring](/docs/monitoring.md)
- [Backup & Restore](/docs/backup.md)
- [Security](/docs/security.md)
- [Multi Datacenter](/docs/multidatacenter.md)
- [Parameters reference](/docs/parameters.md)
- [Installing](./docs/installing.md)
- [Production](./docs/production.md)
- [Accessing](./docs/accessing.md)
- [Architecture](./docs/architecture.md)
- [Managing](./docs/managing.md)
- [Resources](./docs/resources.md)
- [Upgrading](./docs/upgrading.md)
- [Monitoring](./docs/monitoring.md)
- [Backup & Restore](./docs/backup.md)
- [Repair](./docs/repair.md)
- [Decommission](./docs/decommission.md)
- [Security](./docs/security.md)
- [Multi Datacenter](./docs/multidatacenter.md)
- [Parameters reference](./docs/parameters.md)

## Version Chart

Expand Down
24 changes: 24 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# KUDO Cassandra Architecture

Apache Cassandra is a stateful workload. KUDO Cassandra uses kubernetes
statefulset as the basic piece of the KUDO Cassandra Architecture

As a StatefulSet maintains sticky identities for each of its Pods, this helps
KUDO Cassandra to automate all necessary operations with Apache Cassandra nodes.

To help with updates and upgrades, KUDO Cassandra comes with a custom config
maps thats helps for rolling updates for KUDO Cassandra. Apache Cassandra
maintenance jobs like `repair` and `backup/restore` are configured as kubernetes
jobs and are only deployed on-demand when configuring their respective
parameters.

![](images/architecture.png)

## Multi-Datacenter Architecture

KUDO Cassandra can span a ring across multiple kubernetes clusters, to
facilitate the deployment across various regions and zones. Read more about
multidataceneter configuration options in the
[multi-dataceneter](./multidataceneter.md) docs.

![](images/multi-dc-arch.png)
42 changes: 42 additions & 0 deletions docs/decommission.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Decommission KUDO Cassandra nodes

KUDO Cassandra does not provide an automated way to scale down the Cassandra
cluster, as this is a critical operation that should not be repeated frequently,
and to discourage anti-patterns when managing an Apache Cassandra cluster.

## Manually decommissioning KUDO Cassandra nodes

KUDO Cassandra only supports decommissioning the node with the highest pod
ordinal index. e.g. when having a cluster with following pods:

```
NAME READY STATUS RESTARTS AGE
analytics-cassandra-node-0 2/2 Running 0 124m
analytics-cassandra-node-1 2/2 Running 0 123m
analytics-cassandra-node-2 2/2 Running 0 120m
analytics-cassandra-node-3 2/2 Running 0 118m
analytics-cassandra-node-4 2/2 Running 0 117m
```

we can only decommission `analytics-cassandra-node-4` as it has the highest pod
ordinal index `4`.

### Decomission the node

```bash
kubectl exec -it pod/analytics-cassandra-node-4 \
-n dev \
-c cassandra \
-- \
nodetool decommission
```

Once the operation is completed, we can update the KUDO Cassandra Instance

```
kubectl kudo update -p NODE_COUNT=4 --instance analytics-cassandra -n dev
```

Once the update plan is complete, we can delete the PVC that was attached to the
KUDO Cassandra `pod/analytics-cassandra-node-4`. Not deleting or cleaning the
PVC will result in issues when scaling the cluster up next time.
Binary file added docs/images/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/multi-dc-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions docs/repair.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Repair KUDO Cassandra

KUDO Cassandra comes with a repair plan. It can be triggered using the
`REPAIR_POD` parameter.

Let's see with an example of a 3 node cluster

```
kubectl get pods
NAME READY STATUS RESTARTS AGE
cassandra-instance-node-0 1/1 Running 0 4m44s
cassandra-instance-node-1 1/1 Running 0 4m7s
cassandra-instance-node-2 1/1 Running 1 3m25s
```

we can repair the node-0 by running

```
kubectl kudo update --instance=cassandra-instance -p REPAIR_POD=cassandra-instance-node-0
```

This launches a job to repair the node-0

```
kubectl get jobs
NAME COMPLETIONS DURATION AGE
cassandra-instance-node-repair-job 0/1 6s 6s
```

You can also follow the repair plan through the plan status

```
kubectl kudo plan status --instance=cassandra-instance
Plan(s) for "cassandra-instance" in namespace "default":
.
└── cassandra-instance (Operator-Version: "cassandra-1.0.0" Active-Plan: "repair")
├── Plan backup (serial strategy) [NOT ACTIVE]
│   └── Phase backup (serial strategy) [NOT ACTIVE]
│   ├── Step cleanup [NOT ACTIVE]
│   └── Step backup [NOT ACTIVE]
├── Plan deploy (serial strategy) [NOT ACTIVE]
│   ├── Phase rbac (parallel strategy) [NOT ACTIVE]
│   │   └── Step rbac-deploy [NOT ACTIVE]
│   └── Phase nodes (serial strategy) [NOT ACTIVE]
│   ├── Step pre-node [NOT ACTIVE]
│   └── Step node [NOT ACTIVE]
└── Plan repair (serial strategy) [COMPLETE], last updated 2020-06-18 13:15:35
└── Phase repair (serial strategy) [COMPLETE]
├── Step cleanup [COMPLETE]
└── Step repair [COMPLETE]
```

And to fetch the logs of the repair job we can get logs of the job.

```
kubectl logs --selector job-name=cassandra-instance-node-repair-job
I0618 11:18:06.389132 1 request.go:621] Throttling request took 1.154911388s, request: GET:https://10.0.0.1:443/apis/scheduling.kubefed.io/v1alpha1?timeout=32s
[2020-06-18 11:18:14,626] Replication factor is 1. No repair is needed for keyspace 'system_auth'
[2020-06-18 11:18:14,723] Starting repair command #1 (66fb8e80-b155-11ea-8794-a356fd81d293), repairing keyspace system_traces with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 512, pull repair: false)
[ ... lines removed for clarity ...]
[2020-06-18 11:18:18,720] Repair completed successfully
[2020-06-18 11:18:18,723] Repair command #1 finished in 4 seconds
```
2 changes: 1 addition & 1 deletion docs/resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ default └─Pod/cassandra-instance-node-2
### Statefulset

Statefulsets are designed to manage stateful workload in Kubernetes. KUDO
Cassandra uses statefulsets. The operator by default uses `OrderedReady`pod
Cassandra uses statefulsets. The operator by default uses `OrderedReady` pod
management policy. This guarantees that pods are created sequentially, which
means that when the Cassandra cluster is coming up, only one node starts at a
time. Pod names are <instance-name>-node-<ordinal-index> starting from
Expand Down

0 comments on commit 2af0c7b

Please sign in to comment.