Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of design for image update automation #5

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Design of image update automation

There are three parts to the design, that operate independently, and
in sympathy with one another:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these controllers different Reconcile loops running within the same controller-runtime Manager binary, acting on CRDs? Trying to understand the context


- the job runner (type UpdateJob)
- the image metadata reflector (types ImageRepository, ImagePolicy)
- the automation controller (type ImageAutomation)

Some tooling comes alongside these parts:

- the `tk-image` command-line tool lets you create the resource
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this evolve into a tk image subcommand once "graduated"? I guess that's what you mean with "extension to the gitops toolkit CLI"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gesturing at an extension mechanism similar to git, whereby if tk-image is in the $PATH, tk will invoke it to handle the subcommand tk image.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks for the clarification. Might be worthwhile to mention that idea in the doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toolkit CLI uses <verb> <kind> <name> <args>, I personally don't like the idea of plugins, it would break the current UX. Images should fit nicely into our current design tk create/get/reconcile/delete/suspend/resume image repository|policy same as sources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care too much about operating as an extension. The trade-off in the other direction is that all the command-line toolery is centralised in tk, which must be changed every time there's a new type or system.

Copy link
Member

@stefanprodan stefanprodan Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toolkit repo is where everything comes together, we have all the components installers, CRDs and APIs being assembled, tested and released under a single version. Having tk depend and use all the current APIs was a good exercise to deal with breaking changes and interoperability between them, for example tk reconcile kustomization --with-source. If each component comes with it's own CLI, the trade-off would be on the UX side, things like --with-source and other cross-api commands wouldn't be possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If each component comes with it's own CLI, the trade-off would be on the UX side, things like --with-source and other cross-api commands wouldn't be possible.

This isn't really accurate as stated. It's possible, for example, to have things that depend on source-controller types have an option --with-source, without them being part of tk. Assuming there is some layering to the individual components -- e.g., the image stuff here depends on a source-controller type, but nothing in tk depends on the image stuff -- there can be tools that follow the same user interface and don't lose anything by being separate.

But I think you're going for another point, which is something like "putting everything in tk makes it obvious when the APIs are not interoperable". No argument here -- but I'd be careful not to make every piece need every other piece in order to work.

mentioned above, and can be used as an extension to the gitops
toolkit command-line tool.
- the `image-update` image can be used with `kpt fn`, GitHub Actions,
and `UpdateJob` to update images in resources within a working
directory.

## Update job controller

The job controller runs `UpdateJob` resources, each of which specifies
an update operation on a git repository. Each run checks out the given
repo at the specified ref, runs the specified image with the
arguments, then commits and pushes as specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could highlight that the source branch/tag can be different to the destination branch. One workflow that I have in mind is:

  • fetch manifests from the latest semver tag
  • commit/push patch to the staging branch
  • GH Action opens a PR from staging into master
  • cluster admin merges the PR and create a GH release
  • source-controller pulls the new semver tag
  • kustomize/helm reconcilers apply the changes on the cluster


### Integration

The automation controller and the command-line tool `tk-image` create
`UpdateJob` resources for making changes within a git repository.

The jobs created by the automation controller and the command-line
tool use `image-update` as the image to run on the git repo.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about what to package inside of this image to actually swap out the values?
Kustomize? A custom kyaml binary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A custom executable that uses kyaml, most likely. I don't think kustomize would be up to it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sounds good. Should be doable essentially with more or less the code written in this commit: weaveworks/wksctl@05eaf54


### Design notes

This could be a more specific `UpdateImageJob`, but it would differ
only in the update being done, so it is a small step from there to a
general job.

The motivation for making the jobs separate to the automation is that
you can then do ad-hoc updates by creating an `UpdateJob` from
command-line tooling. The downside is that it needs another moving
part, albeit a generally-useful one.

## Image metadata reflector

The image metadata reflector reconciles `ImageRepository` and
squaremo marked this conversation as resolved.
Show resolved Hide resolved
`ImagePolicy` resources. An `ImageRepository` specifies an image
repository to scan, and an `ImagePolicy` selects a specific image from
a repository according to given rules. The purpose of these is to make
that information available to some other system within the cluster.

### Integration

The automation controller creates these as indicated by information
from its specificaiton and the repository it looks at; and it consults
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: specification

these resources when constructing `UpdateJob` resources to run.

### Design notes

The image repository and policy are separated so that different
policies can be derived from the same image repository specification.
Credentials only need to be specified once, for the repository object,
rather than maintained for all the policies.

An alternative is to _just_ have policies, and infer the repositories
that need to be scanned. This would mean less to do e.g., for the
automation (it could just directly create each policy as it finds
it). It would make the implementation of the controller more
complicated though, since it would need to maintain internal state for
the repository scanning, rather than being able to consult
`ImageRepository` resources.

## Automation controller

The automation controller monitors `ImageAutomation` resources, which
specify a git repository on which to run automation and a
specification of how to update the repository. For each of these, it:

- calculates which image repositories need to be scanned, and the
policies for updating them, according to the specification given in
the `ImageAutomation` and git repository;
- consults the policies it created to determine updates to perform;
- creates and manages `UpdateJob` resources to run the updates.

### Specification

To be designed -- see the notes below.

### Design notes

There's a large design space for the automation, along various axes:

- where does the specification for what is automated live -- in git
or in a resource?
- is the specification part of the resource/file it automates, or
separate?
- does the specification name all the things to which it applies; or
does it work with rules or patterns?

So, for example, one design could lie at this point in the space: an
`ImageAutomation` resource names a git repository, a workload object,
and an `ImagePolicy`; every time the policy selects an image that does
not match what is given for the workload resource in git, a job is
created to update it.

It is easy to see how to implement this design, since everything is
totally explicit -- you just do what the resources say. But it's not
great for the user, because they have to spend time spelling it all
out for the controller, and do the work of keeping the automation and
policy objects in the cluster up to date with what's in the git repo
(one way to do that is to keep them in the git repo and let them be
synced; but in general, the workloads are not going to run in the same
place as the automation, so it would be fiddly to keep these in the
same place).

A design in the other direction would be to expect annotations,
similar to those used by Flux v1, on workloads to be automated. The
controller would interpret those to determine which image repositories
and policies are needed.

This might be tricky when the automation is managed by a different
team to that in charge of the application configuration -- in that
scenario, the annotations would have to be carefully applied after the
fact (perhaps with a kustomization), which may as well mean the
annotations are kept in a separate file.