Skip to content

Commit

Permalink
[0025] Add tradeoffs
Browse files Browse the repository at this point in the history
Signed-off-by: Micah Hausler <[email protected]>
  • Loading branch information
micahhausler committed Sep 3, 2021
1 parent bf1b5b4 commit d99920e
Showing 1 changed file with 22 additions and 2 deletions.
24 changes: 22 additions & 2 deletions proposals/0025/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ Goals:
* **More easily support non-request serving controllers in Tinkerbell.**
In this architecture, controllers like PBnJ could leverage Kubernetes primitives like [Custom Resource Definitions][crds] (CRDs), [WATCH APIs][watch], and [Field Management][fm] to complete workflow steps.
* **Reduce the security surface of the Tinkerbell API.**
Tinkerbell is a high-value component of data center infrastructure, so protection of DHCP infrastructure, BMC/IPMI management, needs to be treated as such.
Implementing multiple authorization modes is a non-trivial task.
The fewer APIs, and authorization options, and lines of code that exist, the fewer opportunities there are for security issues to arise.
Tinkerbell is a high-value component of data center infrastructure, so protection of DHCP infrastructure, BMC/IPMI management, needs to be treated as such.
* **Support a highly-available architecture.**
Postgres is a fantastic database, but managing high-availability with graceful failover is not trivial.
Using an alternative data store that better supports failure would better help operators to have higher availability and not require downtime for upgrades or failover.
Expand Down Expand Up @@ -145,6 +145,23 @@ This change introduces a dependency on Kubernetes being available.
Concerns around data backup and failover were previously delegated to Postgres administration and now become a Kubernetes administration issue.
As mentioned previously, backup and restoration commands will be added to restore from Kubernetes API unavailability and could help support switching to a different underlying Kubernetes cluster should the primary cluster become unavailable.

### Tradeoffs

The biggest architectural tradeoff in implementing this proposal is ease of feature development for simplicity of administration.
Kubernetes has a lot of industry hype, baggage (but not literal [baggage][k8s-baggage]), and [public failure stories][k8s-af].On-premise operators may not have experience with using or operating Kubernetes and be wary of adding additional complexity.
Operators have numerous options to use and deploy Kubernetes, so in order to mitigate concerns around complexity, we will provide extensive documentation on different alternatives to meet an operator's objectives.

Examples include:
* A cloud provider managed Kubernetes solution (GKE, AKS, EKS)
* A single host [K3s][k3s] cluster
* A multi-host on-premise cluster

<!-- Link found when Google searching "Kubernetes baggage" -->
[k8s-baggage]: https://www.redbubble.com/shop/kubernetes+bags
[k8s-af]: https://k8s.af/
[kcp]: https://github.com/kcp-dev/kcp/
[k3s]: https://k3s.io

### User Experience

In order to use Tinkerbell, clients would interact with the Kubernetes API.
Expand Down Expand Up @@ -248,8 +265,11 @@ Kubernetes cluster administrators can define custom levels of access with RBAC p

## Alternatives

There are primary alternative to achieve some of the stated design goals would leave the Tinkerbell API alone, and leverage/modify the Tinkerbell API's [internal Database interface][tink-db-iface] to function on top of Kubernetes or a key-value datastore like [etcd][etcd].
The main alternative to achieve the stated design goal around availability would leave the Tinkerbell API alone, and leverage/modify the Tinkerbell server's [internal Database interface][tink-db-iface] to function on top of Kubernetes or a key-value datastore like [etcd][etcd].
This alternative would help with high-availability deployments, but all the other motivations would remain unaddressed, and need to be implemented in Tinkerbell's API.

Another alternative would be support the current architecture and codebase for existing functionality, and any new functionality would require migrating to use the Kubernetes resource model.
This proposal does include maintianing the existing functionality and architecture for some period of time, but supporting it indefinitely would add significantly diverging code paths and increased maintenance effort.

[tink-db-iface]: https://github.com/tinkerbell/tink/blob/0f46dc0/db/db.go#L21-L60
[etcd]: https://etcd.io/

0 comments on commit d99920e

Please sign in to comment.