-
Notifications
You must be signed in to change notification settings - Fork 687
design doc: Emissary's CRD conversion logic
date: 2022-10-07
This is an update to the older (2022-08-26) Edgissary's CRD conversion logic document in Notion.
This document describes things as of PR #4055. If working with legacy versions that don't include that PR, things will be different.
Emissary takes several different getambassador.io
API versions;
getambassador.io/v1
, getambassador.io/v2
, and
getambassador.io/v3alpha1
. It accepts these both as Kubernetes
resources (defined via CRDs), and as YAML annotations on certain other
Kubernetes resources:
-
For Kubernetes resources, the CRDs point to a converting webhook that we call "apiext" that converts between whichever versions the Kubernetes apiserver needs. Emissary itself asks the apiserver for resources as the latest version, so that the main part of the Emissary code only needs to worry about one version.
controller-runtime helps with this; providing code that handles all the webhook-y stuff, and we only need to provide the conversion functions.
-
For annotations, Emissary does have to handle multiple versions. The "watcher" part of Emissary constructs a snapshot that it passes to the rest of the system; this watcher also converts the resources in the snapshot to the latest version, so that the main part of the Emissary code only needs to worry about one version.
We borrow some of controller-runtime's webhook conversion code (since controller-runtime doesn't publicly expose the parts we need) in order to do this;
borrowed_webhook.go
. This is perhaps not the design that would be best on its own, but consistency with controller-runtime is good.
There are 2 upstream conversion mechanisms at play here:
-
The core
k8s.io/apimachinery/pkg/runtime.Scheme
where you callscheme.AddConversionFunc(srcType, dstType, conversionFunc)
and it builds a mesh of possible conversions. These functions are not methods on the resource types, which may be a downside, but it does mean that you can write conversions for 3rd-party types that you can't add methods to. And plus there's aconversion-gen
tool that will mostly write these functions for you and write functions that do all thosescheme.AddConversionFunc
calls for you. The big downside is that it doesn't know how to traverse this as a graph; you'd have to register a separate function for each combination of srcType and dstTypes; to do arbitrary conversions you'd have to generate a full-mesh of functions. -
sigs.k8s.io/controller-runtime/pkg/conversion
attempts to address the tedium of needing N^2 conversion functions and cutting that to 2N by building a layer on top of the coreruntime.Scheme
. And by "on top of" I actually mostly mean "missing the point of". It uses the scheme as a simple listing of known versions, ignoring any conversion funcs in the scheme. Instead of supplying N^2 conversion funcs, between every pair of versions, you just designate one version as the "hub" and all of the other versions ("spokes") implement aConvertFrom
function to convert from the hub to that other version and aConvertTo
method to convert from that other version to the hub. And then the webhook conversion takes the pair of 2 types and if either is a hub, just does calls the appropriate method directly; otherwise it takes the scheme and iterates over it looking for the hub version, then calls the source type'sConvertTo
to convert to the hub and then the destination type'sConvertFrom
to convert from the hub. Also, IMO this is only POC quality in controller-runtime; it's definitely not prime-time ready, one of the biggest drawbacks being thatcontroller-gen
can't help you generate these conversion methods, it's clear not many folks are using it For Real yet (I submitted a kubecon talk about it...).
(You can ask a scheme what types it knows, but not what conversions it knows, at least not without just trying to do the conversion and checking if it returns an error. I've thought that this would be a good addition, and that would allow controller-runtime to be enhanced to be able to build a graph of what the scheme itself knows how to do that, then use basic graph-traversal algorithms instead of needing an explicit hub and all of those spoke methods.)
So, what do we do? Both! Emissary uses conversion-gen
to generate
runtime.Scheme
-compatible conversion functions (well, a patched
conversion-gen
, upstream conversion-gen has turned out to be quite
buggy, and also missing a few highly-convenient features). Then we
have awk scripts in generate.mk
that generate adapters between
the two systems; .Hub()
, .ConvertTo()
, and .ConvertFrom()
methods that call in to the scheme to do the actual conversions. Best
of both worlds!
Emissary designates the latest version (currently v3alpha1) as the hub
version, and v2 as the spoke version; this is done by these 2 lines in
generate.mk
:
generate-fast/files += $(OSS_HOME)/pkg/api/getambassador.io/v1/zz_generated.conversion.go
generate-fast/files += $(OSS_HOME)/pkg/api/getambassador.io/v1/zz_generated.conversion-spoke.go
generate-fast/files += $(OSS_HOME)/pkg/api/getambassador.io/v2/zz_generated.conversion.go
generate-fast/files += $(OSS_HOME)/pkg/api/getambassador.io/v2/zz_generated.conversion-spoke.go
generate-fast/files += $(OSS_HOME)/pkg/api/getambassador.io/v3alpha1/zz_generated.conversion-hub.go
However, there's a trick here! generate.mk
doesn't have
conversion-gen
generate functions between spoke←→hub, it has
conversion-gen
generate functions between one version and the next
version; so v1 gets functions to convert between it and v2, v2 gets
functions to convert between it and v3alpha1; and then the
.ConvertTo
and .ConvertFrom
adapter methods iterate over that
daisy-chain of conversions in the scheme in order to implement the
spoke←→hub conversion.
So, the questions: Why'd I choose v3alpha1 as the hub version? Why
not use the storage version as the hub version? Why have
conversion-gen
generate this daisy-chain instead of directly
generating the spoke←→hub conversions? The answers to these are all
sort of inter-tangled:
-
Precedent: How most of Kube's built-in types do this internally is that they convert to an "internal" version as the hub (which is also how the backend code receives it); this version isn't ever exposed to users; sometimes it's a sort of "bleeding-edge" version that might get frozen as the next stable version, sometimes it has some subtle type changes in it that make it nicer to work with in Go but less nice in YAML. Since we don't have an "internal" version with not-yet-released changes (just "TODO" comments about needing to make those changes), using the latest version (v3alpha1) is closest to what upstream kube does, which is the best we have right now for what "best practice" is.
-
It's about future-readiness:
-
If we designate an old version (v2) as the hub version, that means that in the future when we drop that version, all of the conversion funcs would need to be rewritten for the new hub version. The funcs are mostly generated, but there need to be some hand-written parts too; having to rewrite those all at once is at best a huge amount of churn effort, and at worst a good source of bugs/regressions.
-
If we designate an old version (v2) as the hub version, that means that all of the future conversions will need to deal with weird quirks of v2, and we'll have implementation baggage slowing things down (tech debt) as we move further and further from v2. For example, if we change some semantic from v3alpha1 to v3alpha2, the logical thing to do would be to just code up that conversion from v3alpha1 to v3alpha2, without having to think about "how does this round-trip through v2?"
-
While v3alpha1 is the current latest version, it will eventually be an old version, and the above will apply to it. So do we say "when we introduce a new version, we must immediately make that version be the hub version?"
-
If we evergreen designate the latest version as the hub-version, that means we're updating all of the conversions every time we introduce a new version; see that first point about churn and bugs.
-
If we create a new internal-only version that gets to be the hub version (same as how most kube builtins do), that mostly solves all of the above, but means that everytime we want to change it we need to go update all of the conversions. This adds significant friction to making changes, and creates a strong pressure to prune old versions so that they aren't weighing us down. The friction is bad for developers, and over-eagerly removing versions is bad for users.
-
If instead of just following controller-runtime's hub/spoke model (because, again, I don't think controller-runtime's conversion stuff is terribly thought out or fleshed out at this point and is more of just a POC), and we implement our own daisy-chain model, I think this solves all of these problems. An old tail version can be pruned without affecting anything else. You only need to think about the most recent 2 versions when making a change. When adding a new version, the only change to conversions that you need to make is adding functions to 1 version (the now-formerly most recent version).
-
So IMO the daisy-chain model is the only maintainable model. And so
why use the most-recent-version as the hub? Well, for the chain, you
want all links going the same direction, so it's either the lowest
version or the highest that doesn't get conversion functions. And I
suppose the runtime.Scheme
-to-controller-runtime
adapters don't
need to follow that, and let any version be the hub; but that'd be
more complicated, and I don't really see the point.