-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the Kubernetes Resource Model #46
Conversation
Rather than needing a fully fledged Kubernetes cluster, it would be cool to see this work with kcp |
That would be super cool. As long as KCP supports CRDs, admission webhooks, finalizers, watches, leases, and RBAC (which I think it does support all of those), it should be portable. |
e059283
to
bf1b5b4
Compare
I like that idea of creating in tree k8s support. Specifically making it easier to do a hitless reload on |
ConfigMaps in Kubernetes are unstructured and don't allow for strong typing. This causes lots of practical validation problems. CRDs are roughly equivalent in terms of efficiency and load, and they allow for much easier client development. etcd specifically can be operated in many different modes, from single node (lower resiliency) more typically 3 nodes for increased availability. Neither etcd nor Kubernetes need to be operated on the same compute that Tinkerbell is operated on. This proposal is agnostic to where Kubernetes is run, and operators can connect Tinkerbell to Kubernetes in a managed cloud environment where compute resource is less of a concern.
Neither Kubernetes nor etcd would be required to run on the same compute as Tinkerbell. This would be equivalent to connecting to an external DB service.
Check out the Migrating between Kubernetes clusters, Backups, and Restoration section under Administrative Tasks but the short version is that the Does that address your concerns? |
I would love to see tinkerbell support this as one of many possible data backends, forcing us to create a well defined interface between tink server and n number of possible underlying data stores. |
Like @splaspood, I wonder what the tinkerbell services would look like if the backend could be toggled between a local filesystem ( In order to support these three different storage models, a pluggable data interface would benefit from a Tinkerbell data event system. Rather than having tinkerbell poll a database, or watch a directory, or watch a set of kubernetes CRs and configmaps, tinkerbell could await data events while the data interface modules take on the polling and watching. Does this mean that tinkerbell takes on another grpc service? Perhaps. What events would tinkerbell need to listen for from pluggable data modules? |
Reconsidering my last comment, rather than invent a new format of event drivers and data passing (which are some of the problems we are trying to solve with this proposal), by taking the K8s API as the data and eventing backend (not necessarily with containers involved) we inherit a standard and well-evolved set of tools and opinions that bring stability (to the API and data backend). We solve the pluggable data backend because we can use services like https://github.com/k3s-io/kine to provide a postgresql, MySQL, and yet-to-be-added backends. |
YES! This is exactly it! |
This is a great proposal and the WIP demo at the community meeting was really great. I do like the idea of putting this behind a flag. |
Here's the current code I demoed in the community meeting today: |
Signed-off-by: Micah Hausler <[email protected]> ## Description This includes 4 changes that are preparatory for future work to migrate to the Kubernetes data model. * Propagate stream context in streaming API. Previously `context.Background()` was used, but the stream provides a context object * Refactored shadowed "context" package name. By naming a method variable "context", no `context` package calls could be made in the method * Added `context.Context` to `GetWorkflowsForWorker()` database API. This plumbs down the context from the API call into the `d.instance.QueryContext()` call. * Refactor the database interface. I added a new interface `WorkerWorkflow` with the methods that get used by APIs the Tink Worker invokes. This is essentially a no-op for now. ## Why is this needed See tinkerbell/proposals#46 ## How Has This Been Tested? Locally ran tests. ## How are existing users impacted? What migration steps/scripts do we need? No impact to existing users ## Checklist: I have: - [ ] updated the documentation and/or roadmap (if required) - [ ] added unit or e2e tests - [ ] provided instructions on how to upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this excellent and well-thought-out proposal, @micahhausler.
I've left some comments as I read through the document, many of the questions that I raise are answered elsewhere. This suggests to me that some information could be reordered for clarity. There were a few details that I felt were missing and I've said as much (this may be hard to gather as I left follow-up comments).
I'd recommend going through this feedback and the previous feedback and make sure that any details that were given in feedback comments and responses have made their way into the proposal.
This proposal, to me, is one of the most significant to be raised. Well done!
proposals/0026/README.md
Outdated
At the time of writing, any Tink worker is authorized to list all hardware data, including possibly sensitive metadata for other hardware. | ||
Today, the Tinkerbell Cluster API (CAPI) provider stores machine CA private key data in Tinkerbell hardware metadata. | ||
Ideally, some authorizer that could make decisions based on authentication identity would be used (The Kubernetes analog would be the [Node Authorizer][node-authorizer]). | ||
The authenticatoin method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The authenticatoin method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential. | |
The authentication method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential. |
proposals/0026/README.md
Outdated
|
||
[![current-architecture](./tink-arch-1.png)](.tink-arch-2.png) | ||
|
||
In this proposal, all non-worker clients of Tinkerbell would become Kubernetes clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this proposal, all non-worker clients of Tinkerbell would become Kubernetes clients. | |
In this proposal, all non-worker clients of Tinkerbell may become Kubernetes clients. |
This really comes down to wether or not PostgreSQL support is preserved at all and then whether the user chooses it.
proposals/0026/README.md
Outdated
1. User creates a Workflow CRD object in Kubernetes. | ||
All the user would need to specify in the object `.spec` would be a reference to the Template object, and mappings for devices. | ||
This is analogous to the current `tink workflow create -t $TEMPLATE_ID -r "{\"device_1\":\"$MAC_ADDR\"}` command. | ||
1. The Tinkerbell API would include a Kubernetes workflow controller that would watch for created workflows, and fill out the `.status` with most of the logic that currently exists in the [`CreateWorkflow()`][createwfrpc] RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of status is not discussed here, but something to consider and then call out if utilized is conditioned statuses, https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crossplane/crossplane-runtime#198 includes some relatable examples, benefits, and counter-point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to see an API schema in this proposal? I was thinking it would be better to discuss and iterate on specific API structures in tinkerbell/tink#542
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of Crossplane, it looks like @displague started on https://github.com/crossplane-contrib/provider-tinkerbell a while back, which would at least provide a KRM compliant shim to Tinkerbell. Perhaps an alternative that wouldn't require rearchitecting the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Going a bit off-topic to consider what this KRM implementation means for a Crossplane Runtime Model (XRM) implementation, and perhaps how they could be the same)
Speaking of Crossplane, it looks like @displague started on a Tinkerbell Crossplane provider a while back, which would provide a KRM compliant shim to Tinkerbell
Typically the XRM spec mirrors an external cloud provider API spec as the aforementioned Tinkerell Crossplane Provider does. But if the Tinkerbell spec is implemented as KRM, but does not implement XRM, then a Crossplane Provider would need to interact with Tinkerbell through a KRM mirroring spec. The cluster could have two nearly identical resources within the same cluster to represent a composable resource.
We could overlook this, as Pods and CronJobs or ReplicaSets have a thin wrapping relationship.
However, this suggests to me that it should be possible for a controller to implement an XRM compatible interface while adopting the KRM. The benefits of doing so would be that Crossplane compositions (XR) can be applied, managed by Crossplane. The underlying service, Tinkerbell bare metal orchestration, would transparently serve as a Crossplane Provider. The Tinkerbell service would not be dependent on Crossplane's installation to add this layer of compatibility.
One complication that comes to mind is the expectation that Crossplane Resources have a Provider. In this approach, resources would be the provider. Perhaps, like the Crossplane Helm provider, the provider configuration would refer to the host cluster? Would the default be to report with the local cluster? Would alternatives provider configuration offer connection details for a non-KRM Tinkerbell backplane? Another KRM Tinkerbell backplane?
Adopting Crossplane's packaging mechanisms could provide additional benefits especially for service updates. Although, in this approach, it's not clear to me what benefits this holds over Helm or Kustomize manifests.
What does this mean in practice? The Tinkerbell KRM implementation (starting with tinkerbell/tink#542) would adopt XRM types like Referencers, ResourceStatus, ResourceSpec, and ProviderConfig.
https://crossplane.io/docs/v1.4/contributing/provider_development_guide.html#provider-development-guide
https://blog.crossplane.io/crossplane-vs-cloud-infrastructure-addons/ (by @negz) provides the relevant Crossplane context.
https://danielmangum.com/posts/crossplane-infrastructure-llvm/ (by @hasheddan discusses the versatility of the Crossplane resource model)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely too speculative to be actionable. This would be the first controller to act in this XRM compatible way without being first-and-foremost a Crossplane provider.
proposals/0026/README.md
Outdated
|
||
## APIs | ||
|
||
The following [Tinkerbell Workflow APIs][wf-apis] are used by the Tink worker, and would remain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can these services not be initially replaced by the Kubernetes model? Do you see them eventually be removed?
Do status conditions, events, logs, and additional CRDs offer the means to move these entities to the Kubernetes API?
.. I think I see.
These are the APIs needed for the provisioning machine to get actions and update state. That could be more clearly defined.
Why would Tink workers not become Kubernetes clients, for example? Each machine could use a unique service account which would open up more possibilities (potentially reducing the machine threat model).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can these services not be initially replaced by the Kubernetes model? Do you see them eventually be removed?
The main reason to only include support for worker APIs is to shorten the migration period and amount of effort required. The ultimate goal is to make all non-worker Tink clients talk to Kubernetes, so instead of a two-step migration where boots/hegel still talk to Tink API, just go directly to the new model. So yes, I do see them eventually removed.
These are the APIs needed for the provisioning machine to get actions and update state. That could be more clearly defined.
Will do.
Why would Tink workers not become Kubernetes clients, for example? Each machine could use a unique service account which would open up more possibilities (potentially reducing the machine threat model)
I am not inclined to make Tink workers K8s clients at this time. The main reason is I want to limit what an individual (unauthenticated/untrusted) worker can do or see in the API. K8s authz lacks the ability to easily do something like "This worker should only be able to view a specific CRD that matches the worker's name." We basically want attribute based authz, and K8s is all role-based. Yes, you could create a specific RBAC role for each worker, but that gets messy. It's much easier to have a trusted deputy like the Tink API that can correctly and securely propagate the state changes in the storage layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this has been vetted quite a bit and has been open for comment for a while now. I believe @micahhausler has addressed all comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I'd love to see this moved forward.
957b75c
957b75c
to
2f74f34
Compare
I've updated the proposal and addressed comments |
@displague Fixed cutoffs and did a once-over of the whole thing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details and revisiting the document for follow-ups, @micahhausler!
Signed-off-by: Micah Hausler <[email protected]>
Updated the status to |
Hey @tstromberg, when/if you have some cycle would you mind checking this out again, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let it be merged! Thank you for addressing the bootstrapping concerns.
Signed-off-by: Micah Hausler <[email protected]> ## Description The new backend uses the Kubernetes CRD types in Tinkerbell core. Boots could use a bit of a refactor/organization in how the backends work, but for now I've tried to fit in to the existing structure. ## Why is this needed This is the boots side of work for tinkerbell/proposals#46 Fixes: # ## How Has This Been Tested? Tested with real bare metal workers using an image built from this branch. I will add E2E tests in a follow up PR, or can add to this branch ## How are existing users impacted? What migration steps/scripts do we need? No impact to existing users ## Checklist: I have: - [ ] updated the documentation and/or roadmap (if required) - [ ] added unit or e2e tests - [ ] provided instructions on how to upgrade
Description
Added proposal 0026 Support for the Kubernetes Resource Model.
View rendered proposals/0026/README.md