-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Deletion of resources not in repo #738
Comments
How would you want it to work? |
@squaremo I think it'd be fair to say that everything we create based on config that we see in git should be deleted in the case it no longer exists in git. |
One of the discussion points could be |
A very naive solution would be to track things we create and delete when those things no longer appear in the current set of config. However, that may be tricky... I'd consider looking at diff between now and previous sync, and see whether that includes file removal, and if so, try removing those. |
@rndstr suggested that we could use annotation (which makes sense in any case) and that we could also consider not deleting anything automatically, and instead ask the user whether there is stuff they'd want to delete. |
That approach probably won't tolerate aggressive history rewrites... but it'd probably be okay to ask for user's attention when we are uncertain about whether to delete something or not. |
Also, we could offer the user an option to export and check-in all object
in their cluster that are not in the repo, and allow them to select things
to exclude (e.g. GKE system pods). We will need to make sure any such lists
are dynamically updated with user in control, which a bit tedious but seems
like the right thing... Then we will have a much more concrete idea of what
things deleted from the repo should be deleted.
…On Wed, 13 Sep 2017, 5:17 am Michael Bridgen, ***@***.***> wrote:
How would you want it to work?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#738 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWS4oQPFcZcyvWxxX0tcxaCRgjOjIBks5sh8fogaJpZM4PVDyW>
.
|
User feedback: https://weave-community.slack.com/archives/C4U5ATZ9S/p1508346257000335
|
For our use case, we were investigating the usage of Flux as a kind of syncer between a cluster and a monorepo containing a series of resource manifests. Ideally, Flux would ensure that the state of the cluster matches the state of the monorepo (any new resources in the monorepo --> get created in the cluster, any resources removed from the monorepo --> get deleted from the cluster). |
A sketch of a design:
Refinements:
|
Great to see this being discussed - this is a critical feature for us as we consider deprecating our previous jenkins/landscaper/helm based deployer in favour of flux or an in-house equivalent. I like the approach suggested but I do wonder if it would be appropriate to use in the kube-system namespace (we are moving towards a deployer per customer and would ideally like to treat deployments to kube-system in a similar fashion.). The issue being that there is so much present in kube-system after cluster bootstrap (e.g. kops addons) that we'd need flexible ways to exclude resources - Ideally a list of label selectors. Using a list of key:value labels would also allow us to exclude the resources installed via helm by adding heritage=Tiller to that list. Even if we move to flux we'd like to support charts (possibly via HelmCRD) Would you be in a position to give any indication of how high priority this is for you ? Ideally a timeframe, if possible although that may be too much to ask. |
If I understand, you would like to be able to immunise some resources from deletion, where the resources are not necessarily known ahead of time? Yeah, selectors is a good solution for that. We'd have to figure out where the selector goes. If the policy is reasonably slow-changing, maybe just in a configmap (it'd be good to move other stuff into a config map too). |
Re priorities: this isn't on the radar for weaveworks right at this minute, but may become more important when we look at supporting multiple backing repos. |
Thanks for the update. And yes, we don't know the resource names themselves ahead of time and there are a lot of them. After the cluster is bootstrapped it kube-system already contains a large selection from here: https://github.com/kubernetes/kops/tree/master/addons but it would depend on the cluster config. The selector list is also much more succinct than the resource list. |
Got it, that makes sense. What do you think about this being an item of configuration, i.e., set in a configmap? It would be composed with the per-resource annotations, and look like (off the top of my head):
|
Yes, I like both the use of a configmap and the format. It feels generally useful and flexible. |
Agree with @cptbigglesworth, this is a critical feature. It would give us an incentive to move away from a push based CI/CD system. Is anyone working on this, @squaremo ? |
Ouch, I was getting really excited for Flux as a replacement for our in house git ops solution, but without deletion it would be a large step back. How about annotations on resources that can be automatically deleted if an equivalent can't be found in manifests. This would be a deletion whitelist approach. Anything with A deletion black list approach is tricky because it relies on annotating resources not in the manifest repo, the very things we have the least control/knowledge of. |
If automating the deletion of deployments requires to much work then one way that I think can be implemented in a way that doesn't change much in the current model is to update the "list controllers" response by adding a field containing which file the controller originates from. The team that I'm working with would be able to use this information to manually handle any removal on our side by:
Then we can also do something like this (on our side):
This would make it possible to somewhat "automate" the removal of Kubernetes objects managed by flux and the only thing that's missing right now is the knowledge of which files are managed by what controllers. |
That is quite nice, but IMO is missing one desirable property: flux should never delete something it didn't create or asserted the existence of previously. I am also concerned that a flux mis-configuration - say accidentally pointing flux at an empty git repo - would wipe out the cluster. An alternative approach is to instruct flux to delete workloads via special 'commands' in the git commit, similar to the commands github has for closing PRs. e.g. 'deletes weave/deployment:prometheus'. Then, if flux cannot find the workload in git, and it can find such a commit message after the workload was last in git, then flux would delete the workload. |
Another approche could be:
|
Step 4 would have to handle cases where a single file contains definitions for multiple workloads. |
This approach is nice because it's simple to understand. Another alternative would be to maintain a list of resources flux has "seen". A resource is "seen" if flux has had some kind of control of it i.e. it has appeared in a k8s config that flux is syncing. The list would be commited to git. If flux notices a resource in this "seen" list but with no corresponding config, it would delete the resource from the "seen" list and the cluster. This would prevent deletion of resources which flux is not controlling. For example, kube-system addons if they are not in the k8s flux is cofigured to look at. Example seen list:
|
I don't think it is simple to understand, because it drives the changes to git from the cluster, rather that the other way around which is supposed to be the case. There's also mechanical problems: do you remove the resource from the cluster first, then the file (or piece of file) from git, or the other way around -- and what happens in either of those cases if you fail in-between the two steps? |
Actually, I've misunderstood Stefan's suggestion I think: the idea in step 1 is that you put an annotation in the manifest file. In that case, my objections are weaker. If you fail to delete the resource in the cluster, you will still have the annotation in git, and will try again next time. Another (weakish) objection is then that deleting something becomes somewhat indirect -- and I still don't like the inverted control, though mechanically it's workable. |
I'm inclined to agree with this principle and the accompanying objection, especially since it would allay many of the concerns expressed above; e.g., it would not remove things from kube-system (unless they had been in git after all).
This has the property (I think?) that you can always tell what to do, given the git repo and the cluster, even if you have never seen either before. A downside is that it is yet another way of doing things :-( And, of course, prone to typos, forgetfulness, and so on. In the absence of a robust implicit mechanism, it might be a goer, though. |
Yes, though perhaps not quite the way I intended. Consider the case where a cluster and git repo meet for the first time, and the git repo used to contain some workload definition and has a subsequent
Yes. Cue empty commits with |
The approach we took was based on these requirements :
To achieve this, the applier (read flux or kube-applier) should label objects that it creates, and we configure this using a list of Key Value labels, conventionally using a single K-V pair with
We can then choose to enable garbage collection and limit the collection to objects which have this label :
As an optimization I believe git diff is used to determine if files have been removed since the last garbage collection run. This model seems easy to reason about, if it suits you. A nice side effect of using labels is you can list all objects under the jurisdiction of a single applier. We've forked kube-applier for now, but I did really like aspects of flux too ! |
Until this is supported I will just create a script in our CI system which parses the git history to see which manifests have been removed, then warns the user (either in CI or slack). Additionally it could suggest a command to remove those resources, but the step would still remain manual. Of course this won't cover all resources, which is why it would need to be manual. |
I have the impression that because of this, a rename of a deployment only creates a new deployment, but does not remove the old one. The problem with that is that we have certain workloads of which only 0 or 1 instances may be running (e.g. a consumer of a queue that needs to keep order). So we really want the rename to also delete the old deployment next to creating a new deployment. We where also experimenting with "Jobs" for database migrations, but turns out:
On a more fundamental level, this seems to break the invariant that "Flux keeps the state of the cluster in sync with the content of the git repo" (we now need an "out-of-band", manual touch with kubectl to delete the old deployment). And ... that manual out-of-band touch is not audited in Flux? That seems to break another invariant "Flux audit logs all changes to the cluster" ? My experiment (in the
The result of pushing this change now is:
And now we see these two deployments:
|
This approach is nice because you get a record of it happening in git (for both a human asking the thing to be deleted, and the thing actually being deleted). |
It might be good to use Kontena's Pharos Cluster as inspiration. They have an addons system which works really well, propagating changes and the ability to delete resources. Most of this logic can be found in In a nut shell, what it appears to do:
It'd be awesome if this logic could be applied by a feature flag/cli option, repo-wide. |
That is an pretty clean way of going about it. If apply fails for a resource, that resource will get deleted, whereas you'd probably want it to just stay where it is until a human can fix things. Perhaps just the rule "don't delete anything that failed to apply" would avoid that. |
I think it'd be reasonable to apply changes one at a time, grouped by folder and then alphabetically sorted. This would allow users to express order and logical groups by naming files When an error is encountered I assume the application of changes would be aborted and the prune step skipped, allowing a human operator to intervene. Additionally, maybe it would be best to revert the commit if the application failed. This behavior could be opt-in. edit: for clarity, application is referring to applying the yaml files |
@stefanprodan I like your suggestion but would be more inclined to suggest rather than |
@hiddeco is it possible to enable garbagecollection but in a "dry-run" or "noop" state so can determine if it will behave as expected rather than accidentally deleting resources you didn't mean for it to do? |
The FAQ still claims deletion is not possible, and links to this issue. Though from my quick scan, it seems like that is out of date. https://docs.fluxcd.io/projects/helm-operator/en/latest/faq.html#i-ve-deleted-a-helmrelease-file-from-git-why-is-the-helm-release-still-running-on-my-cluster |
Might want to open that as a new issue @nickatsegment since this is closed it might be overlooked but I'm here via the FAQ so... |
Arguably docs could be easier to fix with just a PR!
…On Fri, 1 Nov 2019, 10:25 pm Kavanaugh Latiolais, ***@***.***> wrote:
Might want to open that as a new issue @nickatsegment
<https://github.com/nickatsegment> since this is closed it might be
overlooked but I'm here via the FAQ so...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#738?email_source=notifications&email_token=AAB5MSYT5DTHQQ56KVVWW2TQRSUERA5CNFSM4D2UHSLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC4KELA#issuecomment-548971052>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB5MS6CQPALBKOZXTR3GADQRSUERANCNFSM4D2UHSLA>
.
|
Currently, we don't provide deletion. I'm sure there was a discussion of how we could do it earlier, but I've not found a dedicated issue so I thought we should have one.
The text was updated successfully, but these errors were encountered: