-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft of multicluster phase 2 design #31
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Alex Leong <[email protected]>
Have you considered using a CRD to handle each "link" against a target cluster? Would provide us easy config validation, and target clusters properties would become queriable through |
Signed-off-by: Alex Leong <[email protected]>
@alpeb Thanks for bringing this up, it's a good idea to consider. Can you elaborate a bit more on how handing lifecycle through webhooks would look? After my experience with ServiceProfiles, I have to say I'm not a huge fan of CRDs. client-go handles them using generated code which I've found to be very brittle and extremely difficult to version. Making any changes to the CRD makes it extremely difficult or in some cases impossible to upgrade gracefully. In this case specifically, this config is consumed only by the service-mirror-controller and is not involved in any coordination between components. This means that we can have the service-mirror-controller be responsible for parsing and validation. Since we're in total control the parsing and validation logic, we can have it be flexible enough to gracefully handle upgrades and changes to the schema. |
I was thinking more on simply having a CRD without the API aggregation, so no need to get involved with the client-go autogen code. The controller would still need to be something like our current service-mirror controller, that would subscribe as a webhook listener to all the CRUD events happening to the CRD instances. Upon the CRD creation event the endpoints and probing would need to be setup just like in Proposal 1 in this RFC. Upon the CRD deletion event the endpoints would be torn down (btw what is the proposed mechanism for tear down in Proposal 1?). As in the current proposal, using CRD alleviates us from having to deal with watchers. But we could still not use webhooks and keep on using watchers. Also note this implies sticking with the current single-controller approach. The main advantage in my view of a CRD vs a ConfigMap is that changes made to it are immediately validated; if the user makes a mistake the resource will be rejected upon applying it, whereas errors in the ConfigMap will have to be surfaced elsewhere. As for versioning, I'm out of the loop regarding our current difficulties, and whether that's relevant only to API aggregation. We can have a very minimal OpenAPI schema for the CRD and the service mirror controller (or a separate controller) can act as a validator webhook that we can evolve as needed. An then |
I like using CRDs when start having more than one piece of data that requires configuration. As we'll need a couple here, feels like a good improvement. |
Having instant validation via a validating webhook and having more useful data in It looks like the service mirror controller could probably use the dynamic client package to get the custom resource as an I still would prefer to have a separate service mirror controller to handle each target cluster. But it could certainly load its configuration from a custom resource instead of a confgmap. To answer your question about link teardown: tear down involves deleting all of the resources installed by
But this is a tricky cumbersome command that includes a huge list of resource types. To make this easier, we could bundle this functionality into an
|
Requests to webhooks are raw bytes, so I was thinking on simply marshalling that with some ad-hoc struct, but |
Signed-off-by: Alex Leong <[email protected]>
|
||
``` | ||
linkerd --context=source multicluster unlink --cluster-name=foobar | kubectl delete -f - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behavior if no name is specified?
Are all clusters unlinked? That would be bad 'default' behavior...
Should this require a list of cluster names, e.g. ulink foobar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as in the link
command, the --cluster-name
flag is mandatory.
We also describe an explicit --all-clusters
flag below.
## Upgrading | ||
|
||
To upgrade the service mirror controller, simply run the link step again with a | ||
newer version of the CLI: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the dependencies between clusters? I.e. if i've got a gateway (target) cluster at version N and a mirror (source) cluster at version M, what are the things that must be common to both versions? Could these change? How will they change?
I also think this is oversimplified. How do you upgrade the CRD, for instance? Can't we "break" existing mirror configurations with changes to this CRD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to speculate about dependencies between clusters without knowing in advance of WHAT will change. For example, the link
command extracts certain information from the target cluster such as the gateway address and the service account token. If the location or format of this information has changed, this will not work.
I suspect that you would have to upgrade the target cluster first in this case by running linkerd mc install
before running link
again. But I'm not sure this speculation is very useful without knowing in advance what is changing.
We avoid using generated code to interact with the CRD specifically to avoid issues during upgrades. The service mirror controller will read the link
resource using a dynamic client that gets the contents as an untyped map. This allows us to build in our own backwards compatibility and ensure that if the format of the CRD changes that the service mirror controller is able to handle both the old and new format.
linkerd --context=target multicluster link --cluster-name=foobar | kubectl --context=source apply -f - | ||
``` | ||
|
||
Alternatively, we can add an `update` subcommand to `multicluster` which finds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thank you
kubectl --context=source get links | ||
NAME NUM SVC ALIVE LATENCY_P50 LATENCY_P95 LATENCY_P99 | ||
foobar 3 True 275ms 385ms 397ms | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might delve a little into the implementation design phase, but I just wanted to dig a little bit and throw some high level ideas on how this could be addressed...
As I understand it, extra printer columns are defined so that they show the values of some properties of the CRD instance, retrieved through a jsonpath expression. Stuff like the target cluster name and the services selector are readily available. But metrics like latencies would need to be actively surfaced into the CRD instance, as status properties I guess (is this what the status subresource is for?). The probe worker itself could update that status as it updates the latencies prometheus metrics.
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in linkerd/rfc#31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see #4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see #4708 Signed-off-by: Alex Leong <[email protected]>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in linkerd/rfc#31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see linkerd#4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see linkerd#4708 Signed-off-by: Alex Leong <[email protected]>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in linkerd/rfc#31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see linkerd#4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see linkerd#4708 Signed-off-by: Alex Leong <[email protected]> Signed-off-by: Eric Solomon <[email protected]>
This was accidentally closed when we changed the default branch to |
Signed-off-by: Alex Leong [email protected]