Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Controller<K> runtime #148

Closed
clux opened this issue Feb 26, 2020 · 18 comments · Fixed by #258
Closed

Create a Controller<K> runtime #148

clux opened this issue Feb 26, 2020 · 18 comments · Fixed by #258
Assignees
Labels
runtime controller runtime related

Comments

@clux
Copy link
Member

clux commented Feb 26, 2020

To provide something like controller-runtime's Controllers. If you are not aware of how it works, the more opinionated watch the talk from kubecon sandiego on kubebuilder's interface. Many positives in there that we can probably steal ideas from, as well as things to avoid (like how not to deal with Option types).

Wiring

Their interface kind of looks like:

Controller.for(MyKind) // <- root kind
    .owns(corev1.Deploy)  // <- root kind owns some Deploys
    .owns(appsv1.Service) // <- root kind owns some Services
    .watches(
        source.Kind(Type: webappv1.Redis(),
        handler.EnqueueRequestsFromMapFunc(r.booksUsingRedis)
    ) // <- specifies relationship mapping manually
    .init()

We probably don't need the full idea of a Manager to attach a controller to, since I imagine it can self-manage with futures and infinite polling tasks, but we should make our own version of a Controller with similar design goals:

We MUST to be able to:

  • create a Controller<K> where K is the owned kube object (usually the crd)
  • set what other kube resources it owns (maybe it create deploys/services say)
  • manage informers for kube resources

We SHOULD also

  • manage custom watchers for third party resources
  • set what other resources it watches (maybe it watches external resources)

Reconcile

Additionally, we must be able to define/provide a reconciler for our owned object, and we need to call this reconcile fn whenever it, something it owns, or something it watches, changes.

Deciding what calls this, probably need to be the meatiest part of the runtime. We need to start many tokio::spawn tasks that run poll on possibly many informers. Then when they receive events, we need to figure out the owner of these. Some of this is wired up in controller-runtime via SetControllerReference which looks kind of awkward. It should just be a forwarding events to a reconcile fn. We probably need an event queue that all informers can push to.

This can abstract away the type of Informer events (as described in #134 (comment)) and potentially also debounce them (so that we don't cause reconciles back to back too quickly on the same object).

The reconciliation loop can then be written in an idempotent way, which forces us to have the return signature of the function be a result type to the Controller internals that effectively tells it whether or not we need to re-run the reconcile for this object (requeue), or maybe do it in n seconds after some objects are up, or maybe we need to do periodic updates.

The function needs to take only two strings: name of object + namespace. The reconcile should almost always start by fetching the CRD (note again that a reconcile request it's not always initiated by the CRD changing).

Examples

Then we need some example controllers that does the basic flow. The kubebuilder example is probably nice. We should also shocase how to set the root object's status.

Open Questions

  • how do we avoid the root object .status write from causing a reconcile on self?
  • how do we catch missed informer events of deleted, owned resources? Informer::init considered harmful? #134 (do we even care about this? if the controller manages these, then this should not happen if the reconcile fn is idempotent)
  • how to do a mapping function between third-party / external resources back to a CRD? (watches relationship)
  • how do we hook a graceful SIGTERM handler into it? (making separate issue for this)
@clux clux added the runtime controller runtime related label Feb 26, 2020
@clux clux added this to the kubecon-eu-2020 milestone Feb 26, 2020
@clux clux self-assigned this Feb 26, 2020
@nightkr
Copy link
Member

nightkr commented Mar 10, 2020

A few more questions..

  • How do we handle locking? Mutex per controller, mutex per root resource, or no mutex at all?
  • It would be nice to be able to queue up a reconcile manually, especially if the runtime does end up handling the locking.
  • how do we avoid the root object .status write from causing a reconcile on self?

Should we? Seems like another place to encourage reconcile to be idempotent.

@clux
Copy link
Member Author

clux commented Mar 10, 2020

  • how do we avoid the root object .status write from causing a reconcile on self?

Should we? Seems like another place to encourage reconcile to be idempotent.

Yeah, I can see a case for both ways. Essentially I think we just want #52 to be an option.

@clux clux mentioned this issue Mar 10, 2020
@clux
Copy link
Member Author

clux commented Mar 13, 2020

  • How do we handle locking? Mutex per controller, mutex per root resource, or no mutex at all?

Thinking of doing this within the controller itself. Mutexing the event queue so we can debounce them.

  • It would be nice to be able to queue up a reconcile manually, especially if the runtime does end up handling the locking.

Yeah, if we're following the go pattern, then the reconcile function will have to return an enum of "what the controller should do next". E.g. EnqueueAfter(Duration) would be one option.

@alexeldeib
Copy link
Contributor

alexeldeib commented Apr 19, 2020

Hey 👋 I work on controller-runtime and I'm a (learning) Rust noob. Love what I see here 🙂

This seems pretty closely related to #102 and I noticed you've been sticking pretty close to the controller-runtime patterns. I also saw the draft in #184 with a controller owning its own informers.

How do we handle locking?

In CR there's one informer per GVK shared in a cache owned by the manger. Each controller has a workqueue which is basically a priority queue based on the next time reconcile a particular object. Informers drive events into the workqueue. Having at most 1 informer/ListWatcher per GVK helps reduces server load. I don't really know enough about how that maps into Rust -- an Arc<Vec<Informer<T>>> or something?

The workqueue itself guarantees an individual controller only processes a single item on one goroutine at a time, since each controller can have multiple goroutines. The workqueue also enforces exponential backoff -- enqueueAfter basically enforces a "notAfter" semantic of adding back to the workqueue with the calculated delay ("Mutexing the event queue so we can debounce them." -> this sounds about right).

Anyway, that's not to say anything about how this should work in Rust. I noticed you were feeling out some of these areas in draft PRs and figured I'd share 🙂

https://github.com/kubernetes/client-go/blob/master/util/workqueue/delaying_queue.go
https://github.com/kubernetes-sigs/controller-runtime/blob/8d8953ddeb4b8b9817eb28fcad6efb15c1e335df/pkg/internal/controller/controller.go#L235-L287
https://github.com/kubernetes-sigs/controller-runtime/blob/8d8953ddeb4b8b9817eb28fcad6efb15c1e335df/pkg/internal/controller/controller.go#L151-L168

@clux
Copy link
Member Author

clux commented Apr 19, 2020

Thanks! That is super helpful! Have been trying to dig into controller-runtime myself to make sure I don't engineer myself into another hole - so this is definitely saving me some time.

Sounds like we are already trying the same informer to resource 1-1 mapping. There's actually just a Vec<Informer<K>> in the draft atm.

The queuing behaviour is one of the harder problem. Have not found a good internal queue solution yet, and am very curious to how the exponential backoff interacts with the queue in CR. It looks like it does this:

  • Requeue (no time given) => exponential backoff
  • Requeue (explicit time given) => always use users preferred wait time

Which seems sensible, if I've read it right.

How we translate all this sensibly to rust is obviously the bigger question. We are going to rely on async and futures, so right now it's a question of whether or not we 1. use callbacks with a specific signature that we internally call with a set number of arguments and a strictly controlled signature, or 2. we expose some kind of pull based mechanism into the queue (but somehow still retaining the backoff mechanics).

Am leaning a bit towards 2. atm (even if it needs more than one queue to separate public/private + backoff mechanics), but that's all up for debate still. Note that solution 2. looks more like the informer examplese herein (no callbacks, just pull events out).

@alexeldeib
Copy link
Contributor

alexeldeib commented Apr 19, 2020

You've got the behavior right on the CR side. Your recommended solution sounds great to me! My intuition tells me theres some nice async-y way to drive this in Rust, and it looks like you already have some of it: https://github.com/clux/kube-rs/pull/184/files#diff-0db9a0cd2ebbcd77c0d69cf94c787b5eR150-R158

re: informer 1:1 with resource -- I meant N controllers can share 1 informers in CR. In your draft it looks like a new informer per controller? Was thinking some sort of cache of informers passed to new (can default to nil for new informers, or use some sort of builder pattern?)

I'd love to try my hand at writing a finalizer example you mentioned in another issue -- maybe that'll give me some better ideas.

@clux
Copy link
Member Author

clux commented Apr 19, 2020

Oh, interesting. I didn't even think about the case where we have more than one controller per binary, that sounds unusual. It will also complicate things a bit. Who owns all the Informers in CR then? The Manager? Needs to be something higher level than the Controller, and it feels a bit awkward for the apps to manage this 🤔

I'd love to try my hand at writing a finalizer example you mentioned in another issue -- maybe that'll give me some better ideas.

Yeah, please. If you get anywhere useful it'd be very much appreciated 👍

@alexeldeib
Copy link
Contributor

Who owns all the Informers in CR then? The Manager?

Yep -- the manager owns an object cache shared between both the kubeclient and the informers. So the manager has a global-ish cache of all objects any controller under it cares about. There's also a non-cached client, of course, and writes are never cached.

more than one controller per binary, that sounds unusual.

I think it's the common case, rather than the exception -- I rarely see production-level operators with < 2 controllers. Many controllers in one binary describes the core k8s components pretty well 😋 Advanced users of CR might even dynamically load/unload controllers inside their operator code. https://github.com/crossplane/crossplane-runtime does something like this.

@nightkr
Copy link
Member

nightkr commented Apr 19, 2020

I didn't even think about the case where we have more than one controller per binary, that sounds unusual.

I think it seems like a fairly common pattern to have a ThingClaim that represents an abstract desire for something (for example, a PersistentVolumeClaim is a desire for a certain type of disk with a certain amount of space) and a Thing that represents something that fulfills that desire (like a PersistentVolume that represents an actual partition on a disk, or a Ceph RBD, or whatever else). Those probably want separate controllers with very different behaviours.

It's also sometimes not as 1:1 as that. For example, you could consider ReplicaSets and StatefulSets to be different kinds of PodClaims with very different reconciliation semantics. And then Deployment is, in turn, pretty much a ReplicaSetClaim. So, I guess, a.. PodClaimClaim? :P

@nightkr
Copy link
Member

nightkr commented Apr 20, 2020

re: informer 1:1 with resource -- I meant N controllers can share 1 informers in CR. In your draft it looks like a new informer per controller? Was thinking some sort of cache of informers passed to new (can default to nil for new informers, or use some sort of builder pattern?)

Okay, I always understood it as:

  1. Informer queries the server, handles disconnects gracefully, and emits watch events
  2. Reflector populates a local cache from watch events (presumably from an Informer)
  3. Controllers reconcile based on Informer events and timers, and may keep Reflectors for child resources

This is roughly how kube-rs works at the moment (aside from Controller not really existing as a separate thing, and Reflector basically containing a reimplementation of Informer).

Based on this understanding, sharing an Informer seems like an inherently bad idea, with a bunch of ugly corner cases where bugs can hide:

  1. Do we clone events or distribute them (where each event goes to one listener)?
    • Cloning seems like the obviously correct behaviour, but the least-effort way to implement this (putting the impl Stream behind a Mutex) would end up distributing them.
  2. When a new listener is added, do we replay all events, replay synthetic events for each resource that exists, or give them a blank slate?
    • Replaying all events would require storing all events since the Informer started (a pretty huge memory leak).
    • Building the synthetic events would require the Informer to keep its own cache, basically reimplementing Reflector.
    • Starting listeners on a blank slate would be prone to race glitches, and has been rejected before (Reflector::reset has an (admittedly unlikely) race condition #136).

But (from what I can deduce from https://pkg.go.dev/k8s.io/client-go/tools/cache?tab=doc, please correct me if this is still wrong!), this isn't actually how the client-go interfaces work! Instead:

  1. Reflector queries the server and populates a local cache (Store)
  2. Informer generates events based on the local cache (Store) that the Reflector keeps updated
  3. Controller works as above, but s/Reflector/Store/g

That pretty much solves problem 2 from earlier, but at the cost of needing to diff out the changes from the cache.. which was just updated from the watch events from the server! That doesn't feel quite right either.

@alexeldeib
Copy link
Contributor

alexeldeib commented Apr 20, 2020

I need to play with the kube-rs code a bit to understand more deeply, but your understanding at the end is basically correct. Re: diffing out changes for the cache, you get this for free via resource version if I understand the question correctly (which is backed by the underlying k/v storage in etcd). I think we're talking about this, more or less: https://github.com/kubernetes/client-go/blob/ede92e0fe62deed512d9ceb8bf4186db9f3776ff/tools/cache/shared_informer.go#L534-L566

(edit: might need something like the fifo delta queue?)

@alexeldeib
Copy link
Contributor

jfyi...I think the pkg.go.dev link is missing quite a bit of content from the godoc.org link on the SharedInformer which I found illuminating. I wasn't familiar with the reasoning behind the delta FIFO queue, but it appears to provide some the guarantees enumerated in the long comment on SharedInformer: https://godoc.org/k8s.io/client-go/tools/cache?tab=doc#SharedInformer

@nightkr
Copy link
Member

nightkr commented Apr 20, 2020

Ouch, yeah. Feels misleading that they label pkg.go.dev as the official successor when it has traps like that.

@nightkr
Copy link
Member

nightkr commented May 7, 2020

Kind of spoiled this already on Discord, but I took a stab at reimplementing Kube-rs's runtime as a series of Stream transformers, as well as a Controller on top of that. It's still missing some polish (like debouncing reconciliations), but so far it seems really promising IMO.

Example operator: https://gitlab.com/teozkr/kube-rt/-/blob/master/examples/configmapgen_controller.rs
Controller: https://gitlab.com/teozkr/kube-rt/-/blob/master/src/controller.rs

@clux
Copy link
Member Author

clux commented May 7, 2020

It's so clean. Fantastic work. Going to have a bank holiday field day with it tomorrow :D

@nightkr
Copy link
Member

nightkr commented May 15, 2020

FWIW, kube_rt::controller now supports event debouncing.

That said.. What do you think about the future of kube_rt? To me there are a few paths forward..

  1. We replace kube::runtime with (the guts of) kube_rt
    • + I believe kube_rt is a significant improvement over kube::runtime in both safety, modularity and ease of use
    • - Very breaking change since kube_rt is a fairly significant restructuring
    • - Some uses of kube::runtime cannot be reproduced with kube_rt (in particular, the resourceVersion is considered very internal in kube_rt::watcher and cannot be queried)
    • - kube and kube_rt has very different error handling philosophies: kube bundles everything into a global Error enum, while kube_rt tries to ensure that all error types are contextual and isolated between the different modules.
  2. kube::runtime lives on (but maybe deprecated), kube_rt is a separate crate
    • + Existing users of kube can migrate at their own pace
    • - Can be unclear for newcomers. I remember initially being very confused about kube vs k8s_openapi when I started doing K8s stuff in Rust. This seems even worse, since they implement roughly the same functionality, and kube_rt depends on kube.
    • - Users of kube_rt have to build kube::runtime too anyway
    • - Somewhat increased maintenance burden
  3. We split kube::runtime into a separate crate that coexists with kube_rt
    • Mostly the same as path 2, but..
    • + Avoids the duplicated implementations for dependees
    • + Doesn't seem quite as blessed as having it be part of kube proper
    • - Requires existing users to sed s/kube::runtime/kube_runtime_legacy/g (or whatever the name ends up being)

/cc #102 since kube_rt also covers that issue

@clux
Copy link
Member Author

clux commented May 16, 2020

Yeah, that's a good breakdown. Very much agree with what you list as pros/cons here. I am very much in favour of option 1, and primarily so for the ease of maintenance, particularly early on when the patterns are being explored, but also because it does feel like a significant step forward in a better and more higher level direction.

We also seem to have pretty similar viewpoints of what should be responsible for what code-wise, so would very much appreciate being able to pool our limited resources on the various issues herein.

Now technically, there are a couple of things that I'd like to prod you about for the controller first, but I'll do that on discord.

For the smaller things;
Not too concerned about the error handling philosophy differences, if we were to merge. If you'd rather use snafu then I'm down with migrating what we have towards this - I like how explicit you've been able to be with it within kube-rt, and don't really see us losing any particular functionality by migrating to it AFAIKT?

Resource Version, yeah, sure that might be an issue if we break Informer (although you're introducing a watcher instead as a higher level api, so maybe they can temporarily co-exist - though not a big fan of the manual resourceVersion APIs anyway so maybe not).

@nightkr
Copy link
Member

nightkr commented May 16, 2020

Not too concerned about the error handling philosophy differences, if we were to merge. If you'd rather use snafu then I'm down with migrating what we have towards this - I like how explicit you've been able to be with it within kube-rt, and don't really see us losing any particular functionality by migrating to it AFAIKT?

Yeah, they have basically the same feature set and generate the same Error and Display instances, so users shouldn't be able to see any difference.

Resource Version, yeah, sure that might be an issue if we break Informer (although you're introducing a watcher instead as a higher level api, so maybe they can temporarily co-exist - though not a big fan of the manual resourceVersion APIs anyway so maybe not).

Yeah, I don't really see a use case for mucking about with resourceVersion manually. Prometheus metrics maybe?

This was referenced Jun 22, 2020
@clux clux linked a pull request Jul 17, 2020 that will close this issue
6 tasks
@clux clux closed this as completed in #258 Jul 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime controller runtime related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants