Skip to content

Commit

Permalink
Make captions more consistent
Browse files Browse the repository at this point in the history
  • Loading branch information
timebertt committed Feb 10, 2024
1 parent 7ae92ca commit 8f54d5c
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 11 deletions.
6 changes: 3 additions & 3 deletions content/30-related-work.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Hence, these approaches are not suited for generally making Kubernetes controlle

## Study Project {#sec:related-study-project}

![Study project sharding architecture [@studyproject]](../assets/study-project-design.pdf)
![Study project controller sharding architecture [@studyproject]](../assets/study-project-design.pdf)

A previous study project [@studyproject] presents a design and implementation for sharding Kubernetes controllers by leveraging established sharding approaches from distributed databases.
The design introduces a sharder that runs in one of the controller instances as determined by a lease-based leader election mechanism.
Expand Down Expand Up @@ -92,7 +92,7 @@ Before reconciling an object, the reconciler checks if its instance is responsib
Only if it is responsible can it continue with the usual reconciliation.
[@mooresharding]

![Failover with leader election per controller and bucket [@mooresharding]](../assets/reconciler-buckets.pdf)
![Failover with leader election per controller and bucket in knative [@mooresharding]](../assets/reconciler-buckets.pdf)

To realize these mechanisms, all controller instances run all informers.
I.e., they watch all objects regardless of whether they need to reconcile them.
Expand Down Expand Up @@ -193,7 +193,7 @@ All shard instances use a watch label selector with the `scheduled-shard-id` lab
With this, the reconciliation work and watch cache's resource consumption are distributed across the shard instances.
[@kubevela]

![Sharding architecture in KubeVela [@kubevela]](../assets/kubevela-sharding.jpg)
![KubeVela application controller sharding architecture [@kubevela]](../assets/kubevela-sharding.jpg)

While the application webhook dynamically discovers the set of available shard instances, there are no automatic reassignments when a new instance is added, or an existing one is removed.
Most importantly, when a shard instance fails, the assigned applications are not reassigned and no longer reconciled.
Expand Down
4 changes: 2 additions & 2 deletions content/40-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ If there is no available shard, the assignment is deferred until a new shard bec

## Architecture

![Sharding architecture](../draw/architecture.pdf)
![Evolved controller sharding architecture](../draw/architecture.pdf)

The evolved design keeps the sharding mechanisms inspired by distributed databases for membership, failure detection, and partitioning, as presented in the study project.
I.e., individual controller instances announce themselves to the sharder by maintaining a shard lease that also serves the purpose of detecting shard failures.
Expand Down Expand Up @@ -91,7 +91,7 @@ spec:
resource: configmaps
```
: Example ClusterRing resource {#lst:clusterring}
: Example ClusterRing {#lst:clusterring}
The sharded controller deployment only runs the actual controllers themselves, i.e., the actual shards.
Nevertheless, the controller deployment is configured with the corresponding `ClusterRing` to use matching names.
Expand Down
6 changes: 3 additions & 3 deletions content/50-implementation.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ status:
type: Ready
```
: Example ClusterRing resource with status {#lst:clusterring-status}
: Example ClusterRing with status {#lst:clusterring-status}
As the resource name suggests, the `ClusterRing` resource is cluster-scoped.
I.e., the object itself does not reside in a namespace and configures behavior on a cluster-global level.
Expand Down Expand Up @@ -109,7 +109,7 @@ For increased observability, the shard lease controller writes the determined st
|dead|not held by shard (released or acquired by sharder)|
|orphaned|not held by shard, expired at least 1 minute ago|

: Shard states [@studyproject] {#tbl:shard-states}
: Shard lease states [@studyproject] {#tbl:shard-states}

The controller watches the `Lease` objects for relevant changes to ensure responsiveness.
However, it also revisits `Leases` after a specific duration when their state would change if no update event occurs.
Expand Down Expand Up @@ -357,7 +357,7 @@ func run() error {
drain.alpha.sharding.timebertt.dev/clusterring-<hash>-<clusterring-name>
```

: Ring-specific shard label pattern {#lst:drain-label}
: Ring-specific drain label pattern {#lst:drain-label}

Finally, the sharded controllers must comply with the handover protocol initiated by the sharder.
When the sharder needs to move an object from an available shard to another for rebalancing, it first adds the `drain` label to instruct the currently responsible shard to stop reconciling the object.
Expand Down
7 changes: 4 additions & 3 deletions content/60-evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,7 @@ This gives a better estimation of the actual memory requirements of the controll
The Go runtime does not immediately release heap memory freed by garbage collection back to the operating system.
Hence, the process can hold more memory of the system than the program needs, also due to the runtime's batch-based memory allocation.
The query in this evaluation subtracts all released, unused, and free memory from the total amount of memory allocated by the process.
[@tsoukalos2021mastering]

A unique ID is attached to every run and added as the `run_id` label to the corresponding metrics to distinguish between individual experiment runs.
The experiment tool is deployed as a Kubernetes `Job`, and the UID of the executing `Pod` is used as the run's ID.
Expand All @@ -298,7 +299,7 @@ It shows the generated load in both dimensions and the resulting SLIs ([@fig:das
The dashboard calculates the visualized SLIs as rolling percentiles, e.g. over 1 minute.
Additionally, it displays the CPU, memory, and network usage of the sharder and webhosting-operator pods.

![Experiments Grafana dashboard](../assets/dashboard-experiments.png){#fig:dashboard-experiments}
![Grafana experiments dashboard](../assets/dashboard-experiments.png){#fig:dashboard-experiments}

## Experiments

Expand Down Expand Up @@ -397,15 +398,15 @@ However, the buckets' upper bounds are aligned with the SLOs for this evaluation
This means that for every SLI, there is a bucket with the upper bound set to the corresponding SLO.
As interpolation is only applied between the bucket boundaries, the estimated SLI will grow above the SLO when the actual SLI grows above the SLO and vice-versa.

![Cumulative controller SLOs per instance count](../results/scale-out/slis.pdf){#fig:scale-out-slis}
![Cumulative controller SLIs in scale-out scenario](../results/scale-out/slis.pdf){#fig:scale-out-slis}

After the experiment, the control plane SLOs are verified, and the measurements are retrieved from Prometheus.
For each instance count, the last timestamp where the measured SLIs still satisfied the defined SLOs is determined ([@fig:scale-out-slis]).
This timestamp is then used to look up values for both load dimensions.
The resulting value represents the maximum load capacity of each controller setup ([@fig:scale-out-capacity]).
Note that the load capacity values cannot be interpreted as absolute values but only relative to other values of the same load test.

![Load capacity increase with added instances](../results/scale-out/capacity.pdf){#fig:scale-out-capacity}
![Load capacity increase with added instances in scale-out scenario](../results/scale-out/capacity.pdf){#fig:scale-out-capacity}

The results show that adding more controller instances brings more performance and increases the maximum load capacity of the system.
The load capacity grows almost linearly with the number of added instances, so the setup fulfills req. \ref{req:scale-out}.
Expand Down

0 comments on commit 8f54d5c

Please sign in to comment.