This repository has been archived by the owner on Oct 23, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add architecture/reapair/decommission docs (#154)
* add architecture docs Co-authored-by: Marcin Owsiany <[email protected]>
- Loading branch information
Showing
7 changed files
with
150 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# KUDO Cassandra Architecture | ||
|
||
Apache Cassandra is a stateful workload. KUDO Cassandra uses kubernetes | ||
statefulset as the basic piece of the KUDO Cassandra Architecture | ||
|
||
As a StatefulSet maintains sticky identities for each of its Pods, this helps | ||
KUDO Cassandra to automate all necessary operations with Apache Cassandra nodes. | ||
|
||
To help with updates and upgrades, KUDO Cassandra comes with a custom config | ||
maps thats helps for rolling updates for KUDO Cassandra. Apache Cassandra | ||
maintenance jobs like `repair` and `backup/restore` are configured as kubernetes | ||
jobs and are only deployed on-demand when configuring their respective | ||
parameters. | ||
|
||
![](images/architecture.png) | ||
|
||
## Multi-Datacenter Architecture | ||
|
||
KUDO Cassandra can span a ring across multiple kubernetes clusters, to | ||
facilitate the deployment across various regions and zones. Read more about | ||
multidataceneter configuration options in the | ||
[multi-dataceneter](./multidataceneter.md) docs. | ||
|
||
![](images/multi-dc-arch.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Decommission KUDO Cassandra nodes | ||
|
||
KUDO Cassandra does not provide an automated way to scale down the Cassandra | ||
cluster, as this is a critical operation that should not be repeated frequently, | ||
and to discourage anti-patterns when managing an Apache Cassandra cluster. | ||
|
||
## Manually decommissioning KUDO Cassandra nodes | ||
|
||
KUDO Cassandra only supports decommissioning the node with the highest pod | ||
ordinal index. e.g. when having a cluster with following pods: | ||
|
||
``` | ||
NAME READY STATUS RESTARTS AGE | ||
analytics-cassandra-node-0 2/2 Running 0 124m | ||
analytics-cassandra-node-1 2/2 Running 0 123m | ||
analytics-cassandra-node-2 2/2 Running 0 120m | ||
analytics-cassandra-node-3 2/2 Running 0 118m | ||
analytics-cassandra-node-4 2/2 Running 0 117m | ||
``` | ||
|
||
we can only decommission `analytics-cassandra-node-4` as it has the highest pod | ||
ordinal index `4`. | ||
|
||
### Decomission the node | ||
|
||
```bash | ||
kubectl exec -it pod/analytics-cassandra-node-4 \ | ||
-n dev \ | ||
-c cassandra \ | ||
-- \ | ||
nodetool decommission | ||
``` | ||
|
||
Once the operation is completed, we can update the KUDO Cassandra Instance | ||
|
||
``` | ||
kubectl kudo update -p NODE_COUNT=4 --instance analytics-cassandra -n dev | ||
``` | ||
|
||
Once the update plan is complete, we can delete the PVC that was attached to the | ||
KUDO Cassandra `pod/analytics-cassandra-node-4`. Not deleting or cleaning the | ||
PVC will result in issues when scaling the cluster up next time. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Repair KUDO Cassandra | ||
|
||
KUDO Cassandra comes with a repair plan. It can be triggered using the | ||
`REPAIR_POD` parameter. | ||
|
||
Let's see with an example of a 3 node cluster | ||
|
||
``` | ||
kubectl get pods | ||
NAME READY STATUS RESTARTS AGE | ||
cassandra-instance-node-0 1/1 Running 0 4m44s | ||
cassandra-instance-node-1 1/1 Running 0 4m7s | ||
cassandra-instance-node-2 1/1 Running 1 3m25s | ||
``` | ||
|
||
we can repair the node-0 by running | ||
|
||
``` | ||
kubectl kudo update --instance=cassandra-instance -p REPAIR_POD=cassandra-instance-node-0 | ||
``` | ||
|
||
This launches a job to repair the node-0 | ||
|
||
``` | ||
kubectl get jobs | ||
NAME COMPLETIONS DURATION AGE | ||
cassandra-instance-node-repair-job 0/1 6s 6s | ||
``` | ||
|
||
You can also follow the repair plan through the plan status | ||
|
||
``` | ||
kubectl kudo plan status --instance=cassandra-instance | ||
Plan(s) for "cassandra-instance" in namespace "default": | ||
. | ||
└── cassandra-instance (Operator-Version: "cassandra-1.0.0" Active-Plan: "repair") | ||
├── Plan backup (serial strategy) [NOT ACTIVE] | ||
│ └── Phase backup (serial strategy) [NOT ACTIVE] | ||
│ ├── Step cleanup [NOT ACTIVE] | ||
│ └── Step backup [NOT ACTIVE] | ||
├── Plan deploy (serial strategy) [NOT ACTIVE] | ||
│ ├── Phase rbac (parallel strategy) [NOT ACTIVE] | ||
│ │ └── Step rbac-deploy [NOT ACTIVE] | ||
│ └── Phase nodes (serial strategy) [NOT ACTIVE] | ||
│ ├── Step pre-node [NOT ACTIVE] | ||
│ └── Step node [NOT ACTIVE] | ||
└── Plan repair (serial strategy) [COMPLETE], last updated 2020-06-18 13:15:35 | ||
└── Phase repair (serial strategy) [COMPLETE] | ||
├── Step cleanup [COMPLETE] | ||
└── Step repair [COMPLETE] | ||
``` | ||
|
||
And to fetch the logs of the repair job we can get logs of the job. | ||
|
||
``` | ||
kubectl logs --selector job-name=cassandra-instance-node-repair-job | ||
I0618 11:18:06.389132 1 request.go:621] Throttling request took 1.154911388s, request: GET:https://10.0.0.1:443/apis/scheduling.kubefed.io/v1alpha1?timeout=32s | ||
[2020-06-18 11:18:14,626] Replication factor is 1. No repair is needed for keyspace 'system_auth' | ||
[2020-06-18 11:18:14,723] Starting repair command #1 (66fb8e80-b155-11ea-8794-a356fd81d293), repairing keyspace system_traces with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 512, pull repair: false) | ||
[ ... lines removed for clarity ...] | ||
[2020-06-18 11:18:18,720] Repair completed successfully | ||
[2020-06-18 11:18:18,723] Repair command #1 finished in 4 seconds | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters