diff --git a/README.md b/README.md index dd49e530..59e06324 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ The KUDO Cassandra Operator makes it easy to deploy and manage ## Getting started The KUDO Cassandra operator requires [KUDO](https://kudo.dev/) and Kubernetes -versions as specified in [`operator.yaml`](operator/operator.yaml#L4-L5). +versions as specified in [`operator.yaml`](./operator/operator.yaml#L4-L5). To install it run @@ -28,6 +28,11 @@ kubectl kudo install cassandra - Readiness probe - Unpriviledged container execution - Node-to-Node and Node-to-Client communication encryption +- Backup/restore +- Rack-awareness +- Node replace +- Inter-pod anti-affinity +- Multi-datacenter support ## Roadmap @@ -36,17 +41,20 @@ kubectl kudo install cassandra ## Documentation -- [Installing](/docs/installing.md) -- [Production](/docs/production.md) -- [Accessing](/docs/accessing.md) -- [Managing](/docs/managing.md) -- [Resources](/docs/resources.md) -- [Upgrading](/docs/upgrading.md) -- [Monitoring](/docs/monitoring.md) -- [Backup & Restore](/docs/backup.md) -- [Security](/docs/security.md) -- [Multi Datacenter](/docs/multidatacenter.md) -- [Parameters reference](/docs/parameters.md) +- [Installing](./docs/installing.md) +- [Production](./docs/production.md) +- [Accessing](./docs/accessing.md) +- [Architecture](./docs/architecture.md) +- [Managing](./docs/managing.md) +- [Resources](./docs/resources.md) +- [Upgrading](./docs/upgrading.md) +- [Monitoring](./docs/monitoring.md) +- [Backup & Restore](./docs/backup.md) +- [Repair](./docs/repair.md) +- [Decommission](./docs/decommission.md) +- [Security](./docs/security.md) +- [Multi Datacenter](./docs/multidatacenter.md) +- [Parameters reference](./docs/parameters.md) ## Version Chart diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..e0c840af --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,24 @@ +# KUDO Cassandra Architecture + +Apache Cassandra is a stateful workload. KUDO Cassandra uses kubernetes +statefulset as the basic piece of the KUDO Cassandra Architecture + +As a StatefulSet maintains sticky identities for each of its Pods, this helps +KUDO Cassandra to automate all necessary operations with Apache Cassandra nodes. + +To help with updates and upgrades, KUDO Cassandra comes with a custom config +maps thats helps for rolling updates for KUDO Cassandra. Apache Cassandra +maintenance jobs like `repair` and `backup/restore` are configured as kubernetes +jobs and are only deployed on-demand when configuring their respective +parameters. + +![](images/architecture.png) + +## Multi-Datacenter Architecture + +KUDO Cassandra can span a ring across multiple kubernetes clusters, to +facilitate the deployment across various regions and zones. Read more about +multidataceneter configuration options in the +[multi-dataceneter](./multidataceneter.md) docs. + +![](images/multi-dc-arch.png) diff --git a/docs/decommission.md b/docs/decommission.md new file mode 100644 index 00000000..a4889902 --- /dev/null +++ b/docs/decommission.md @@ -0,0 +1,42 @@ +# Decommission KUDO Cassandra nodes + +KUDO Cassandra does not provide an automated way to scale down the Cassandra +cluster, as this is a critical operation that should not be repeated frequently, +and to discourage anti-patterns when managing an Apache Cassandra cluster. + +## Manually decommissioning KUDO Cassandra nodes + +KUDO Cassandra only supports decommissioning the node with the highest pod +ordinal index. e.g. when having a cluster with following pods: + +``` +NAME READY STATUS RESTARTS AGE +analytics-cassandra-node-0 2/2 Running 0 124m +analytics-cassandra-node-1 2/2 Running 0 123m +analytics-cassandra-node-2 2/2 Running 0 120m +analytics-cassandra-node-3 2/2 Running 0 118m +analytics-cassandra-node-4 2/2 Running 0 117m +``` + +we can only decommission `analytics-cassandra-node-4` as it has the highest pod +ordinal index `4`. + +### Decomission the node + +```bash +kubectl exec -it pod/analytics-cassandra-node-4 \ + -n dev \ + -c cassandra \ + -- \ + nodetool decommission +``` + +Once the operation is completed, we can update the KUDO Cassandra Instance + +``` +kubectl kudo update -p NODE_COUNT=4 --instance analytics-cassandra -n dev +``` + +Once the update plan is complete, we can delete the PVC that was attached to the +KUDO Cassandra `pod/analytics-cassandra-node-4`. Not deleting or cleaning the +PVC will result in issues when scaling the cluster up next time. diff --git a/docs/images/architecture.png b/docs/images/architecture.png new file mode 100644 index 00000000..6ba5f574 Binary files /dev/null and b/docs/images/architecture.png differ diff --git a/docs/images/multi-dc-arch.png b/docs/images/multi-dc-arch.png new file mode 100644 index 00000000..32853499 Binary files /dev/null and b/docs/images/multi-dc-arch.png differ diff --git a/docs/repair.md b/docs/repair.md new file mode 100644 index 00000000..c0f3be9a --- /dev/null +++ b/docs/repair.md @@ -0,0 +1,63 @@ +# Repair KUDO Cassandra + +KUDO Cassandra comes with a repair plan. It can be triggered using the +`REPAIR_POD` parameter. + +Let's see with an example of a 3 node cluster + +``` +kubectl get pods +NAME READY STATUS RESTARTS AGE +cassandra-instance-node-0 1/1 Running 0 4m44s +cassandra-instance-node-1 1/1 Running 0 4m7s +cassandra-instance-node-2 1/1 Running 1 3m25s +``` + +we can repair the node-0 by running + +``` +kubectl kudo update --instance=cassandra-instance -p REPAIR_POD=cassandra-instance-node-0 +``` + +This launches a job to repair the node-0 + +``` +kubectl get jobs +NAME COMPLETIONS DURATION AGE +cassandra-instance-node-repair-job 0/1 6s 6s +``` + +You can also follow the repair plan through the plan status + +``` +kubectl kudo plan status --instance=cassandra-instance +Plan(s) for "cassandra-instance" in namespace "default": +. +└── cassandra-instance (Operator-Version: "cassandra-1.0.0" Active-Plan: "repair") + ├── Plan backup (serial strategy) [NOT ACTIVE] + │   └── Phase backup (serial strategy) [NOT ACTIVE] + │   ├── Step cleanup [NOT ACTIVE] + │   └── Step backup [NOT ACTIVE] + ├── Plan deploy (serial strategy) [NOT ACTIVE] + │   ├── Phase rbac (parallel strategy) [NOT ACTIVE] + │   │   └── Step rbac-deploy [NOT ACTIVE] + │   └── Phase nodes (serial strategy) [NOT ACTIVE] + │   ├── Step pre-node [NOT ACTIVE] + │   └── Step node [NOT ACTIVE] + └── Plan repair (serial strategy) [COMPLETE], last updated 2020-06-18 13:15:35 + └── Phase repair (serial strategy) [COMPLETE] + ├── Step cleanup [COMPLETE] + └── Step repair [COMPLETE] +``` + +And to fetch the logs of the repair job we can get logs of the job. + +``` +kubectl logs --selector job-name=cassandra-instance-node-repair-job +I0618 11:18:06.389132 1 request.go:621] Throttling request took 1.154911388s, request: GET:https://10.0.0.1:443/apis/scheduling.kubefed.io/v1alpha1?timeout=32s +[2020-06-18 11:18:14,626] Replication factor is 1. No repair is needed for keyspace 'system_auth' +[2020-06-18 11:18:14,723] Starting repair command #1 (66fb8e80-b155-11ea-8794-a356fd81d293), repairing keyspace system_traces with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 512, pull repair: false) +[ ... lines removed for clarity ...] +[2020-06-18 11:18:18,720] Repair completed successfully +[2020-06-18 11:18:18,723] Repair command #1 finished in 4 seconds +``` diff --git a/docs/resources.md b/docs/resources.md index a57f48c5..b19704ae 100644 --- a/docs/resources.md +++ b/docs/resources.md @@ -123,7 +123,7 @@ default └─Pod/cassandra-instance-node-2 ### Statefulset Statefulsets are designed to manage stateful workload in Kubernetes. KUDO -Cassandra uses statefulsets. The operator by default uses `OrderedReady`pod +Cassandra uses statefulsets. The operator by default uses `OrderedReady` pod management policy. This guarantees that pods are created sequentially, which means that when the Cassandra cluster is coming up, only one node starts at a time. Pod names are -node- starting from