-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft of design for image update automation #5
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# Design of image update automation | ||
|
||
There are three parts to the design, that operate independently, and | ||
in sympathy with one another: | ||
|
||
- the job runner (type UpdateJob) | ||
- the image metadata reflector (types ImageRepository, ImagePolicy) | ||
- the automation controller (type ImageAutomation) | ||
|
||
Some tooling comes alongside these parts: | ||
|
||
- the `tk-image` command-line tool lets you create the resource | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will this evolve into a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm gesturing at an extension mechanism similar to git, whereby if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, thanks for the clarification. Might be worthwhile to mention that idea in the doc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The toolkit CLI uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't care too much about operating as an extension. The trade-off in the other direction is that all the command-line toolery is centralised in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The toolkit repo is where everything comes together, we have all the components installers, CRDs and APIs being assembled, tested and released under a single version. Having tk depend and use all the current APIs was a good exercise to deal with breaking changes and interoperability between them, for example There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This isn't really accurate as stated. It's possible, for example, to have things that depend on source-controller types have an option But I think you're going for another point, which is something like "putting everything in tk makes it obvious when the APIs are not interoperable". No argument here -- but I'd be careful not to make every piece need every other piece in order to work. |
||
mentioned above, and can be used as an extension to the gitops | ||
toolkit command-line tool. | ||
- the `image-update` image can be used with `kpt fn`, GitHub Actions, | ||
and `UpdateJob` to update images in resources within a working | ||
directory. | ||
|
||
## Update job controller | ||
|
||
The job controller runs `UpdateJob` resources, each of which specifies | ||
an update operation on a git repository. Each run checks out the given | ||
repo at the specified ref, runs the specified image with the | ||
arguments, then commits and pushes as specified. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could highlight that the source branch/tag can be different to the destination branch. One workflow that I have in mind is:
|
||
|
||
### Integration | ||
|
||
The automation controller and the command-line tool `tk-image` create | ||
`UpdateJob` resources for making changes within a git repository. | ||
|
||
The jobs created by the automation controller and the command-line | ||
tool use `image-update` as the image to run on the git repo. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you thought about what to package inside of this image to actually swap out the values? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A custom executable that uses kyaml, most likely. I don't think kustomize would be up to it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, sounds good. Should be doable essentially with more or less the code written in this commit: weaveworks/wksctl@05eaf54 |
||
|
||
### Design notes | ||
|
||
This could be a more specific `UpdateImageJob`, but it would differ | ||
only in the update being done, so it is a small step from there to a | ||
general job. | ||
|
||
The motivation for making the jobs separate to the automation is that | ||
you can then do ad-hoc updates by creating an `UpdateJob` from | ||
command-line tooling. The downside is that it needs another moving | ||
part, albeit a generally-useful one. | ||
|
||
## Image metadata reflector | ||
|
||
The image metadata reflector reconciles `ImageRepository` and | ||
squaremo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`ImagePolicy` resources. An `ImageRepository` specifies an image | ||
repository to scan, and an `ImagePolicy` selects a specific image from | ||
a repository according to given rules. The purpose of these is to make | ||
that information available to some other system within the cluster. | ||
|
||
### Integration | ||
|
||
The automation controller creates these as indicated by information | ||
from its specificaiton and the repository it looks at; and it consults | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: specification |
||
these resources when constructing `UpdateJob` resources to run. | ||
|
||
### Design notes | ||
|
||
The image repository and policy are separated so that different | ||
policies can be derived from the same image repository specification. | ||
Credentials only need to be specified once, for the repository object, | ||
rather than maintained for all the policies. | ||
|
||
An alternative is to _just_ have policies, and infer the repositories | ||
that need to be scanned. This would mean less to do e.g., for the | ||
automation (it could just directly create each policy as it finds | ||
it). It would make the implementation of the controller more | ||
complicated though, since it would need to maintain internal state for | ||
the repository scanning, rather than being able to consult | ||
`ImageRepository` resources. | ||
|
||
## Automation controller | ||
|
||
The automation controller monitors `ImageAutomation` resources, which | ||
specify a git repository on which to run automation and a | ||
specification of how to update the repository. For each of these, it: | ||
|
||
- calculates which image repositories need to be scanned, and the | ||
policies for updating them, according to the specification given in | ||
the `ImageAutomation` and git repository; | ||
- consults the policies it created to determine updates to perform; | ||
- creates and manages `UpdateJob` resources to run the updates. | ||
|
||
### Specification | ||
|
||
To be designed -- see the notes below. | ||
|
||
### Design notes | ||
|
||
There's a large design space for the automation, along various axes: | ||
|
||
- where does the specification for what is automated live -- in git | ||
or in a resource? | ||
- is the specification part of the resource/file it automates, or | ||
separate? | ||
- does the specification name all the things to which it applies; or | ||
does it work with rules or patterns? | ||
|
||
So, for example, one design could lie at this point in the space: an | ||
`ImageAutomation` resource names a git repository, a workload object, | ||
and an `ImagePolicy`; every time the policy selects an image that does | ||
not match what is given for the workload resource in git, a job is | ||
created to update it. | ||
|
||
It is easy to see how to implement this design, since everything is | ||
totally explicit -- you just do what the resources say. But it's not | ||
great for the user, because they have to spend time spelling it all | ||
out for the controller, and do the work of keeping the automation and | ||
policy objects in the cluster up to date with what's in the git repo | ||
(one way to do that is to keep them in the git repo and let them be | ||
synced; but in general, the workloads are not going to run in the same | ||
place as the automation, so it would be fiddly to keep these in the | ||
same place). | ||
|
||
A design in the other direction would be to expect annotations, | ||
similar to those used by Flux v1, on workloads to be automated. The | ||
controller would interpret those to determine which image repositories | ||
and policies are needed. | ||
|
||
This might be tricky when the automation is managed by a different | ||
team to that in charge of the application configuration -- in that | ||
scenario, the annotations would have to be carefully applied after the | ||
fact (perhaps with a kustomization), which may as well mean the | ||
annotations are kept in a separate file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these controllers different Reconcile loops running within the same controller-runtime Manager binary, acting on CRDs? Trying to understand the context