Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

House keeper for Pipeline related stuff #64

Closed
vincent-pli opened this issue Aug 7, 2019 · 17 comments
Closed

House keeper for Pipeline related stuff #64

vincent-pli opened this issue Aug 7, 2019 · 17 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@vincent-pli
Copy link
Member

Expected Behavior

When event received, deployment of EventListener will create at least one PipelineResource and one Pipelinerun, then those stuff will get out of hand of Trigger.
That’s caused a problem, after period, there will be tens of hundreds pipeline related stuff leaved.

So I think we need setup garbage disposal/house keeper mechanism for the situation.
I saw #48 , but it need delete EventListener to clean all stuff. I think EventListener is design for long term running.

Actual Behavior

Currently, nothing for this

@vtereso
Copy link

vtereso commented Aug 8, 2019

I agree that resource build up could/would become cumbersome since we are generating resources based on events. Triggers is enabling users to create these resources simply, which only exacerbates this Tekton issue. Maybe an option could be added to the PipelineRuns/TaskRuns, where PipelineResources can be deleted after runs? In terms of handling arbitrary resource deletion, I don't know how much better we can do than EventListener metav1.OwnerReference-based removal (or some of the sort) or having some time based deletion.

@vincent-pli
Copy link
Member Author

Could we add metav1.OwnerReference of the resources (except Pipelinerun) to a Pipelinerun. I believe there is at least one PipelineRun, right?

So the problem is when to delete the Pipelinerun

  • Use informer of Pipelinerun, monitor the Pipelinerun, delete it when complete.
  • Change the Pipeline refer by PipelineRun, add last task which could send event to EventListener, when receive event, delete related Pipelinerun

@vtereso
Copy link

vtereso commented Aug 9, 2019

PipelineResources are created ahead of time compared to PipelineRuns (since they do not support being embedded currently?), where likely only some should have owner references. If a resource doesn't have a specific SHA/revisions/tags, it's potentially reusable so deleting it might be bad? That aside, I guess it is possible to add a metav1.OwnerReference to referenced PipelineResources?

@vtereso
Copy link

vtereso commented Aug 9, 2019

This has also been discussed here: tektoncd/pipeline#544

@vincent-pli
Copy link
Member Author

Proposal:

  1. Design new kind:TriggerExecutor
apiVersion: tekton.dev/v1alpha1
kind: TriggerExecutor
metadata:
  name: some-executor
spec:
   pipelineruns []*corev1.ObjectReference

TriggerExecutor will be create in controller of EventListener. and set Pipelinerun ref in it for all Pipelinerun created in this event received.

  1. Enhance EventTemplate
type TriggerResourceTemplate struct {
        DeleteAfterComplete bool `json:"deleteAfterComplete,omitempty"`
	json.RawMessage `json:",inline"`
}

When creation, all resources in EventTemplate should:

  • Set owner to EventListener
  • If DeleteAfterComplete is true, set owner to TriggerExecutor.

In controller of TriggerExecutor, it will watch both TriggerExector and Pipelinerun

  • Check the status of Pipelinerun when Done, delete it.
  • Check the list of Pipelinerun in Spec of TriggerExector, if all Pipelinerun is complete (deleted) then delete TriggerExector.

@vtereso
Copy link

vtereso commented Aug 26, 2019

I believe the next step being taken to address the resource proliferation is to embed PipelineResources within PipelineRuns, which will implicitly set the owner reference. I see here that job controllers, have taken a ttl approach to deletion as well. @bobcatfish seems like it would be relatively simple?

@vincent-pli
Copy link
Member Author

Yes, but beside the embed case, pipeline also support original way.
and we also need to handle arbitrary resource deletion.

This is a very draft solution, but I think we need consider the issue now, always welcome suggestion.

@vtereso
Copy link

vtereso commented Aug 26, 2019

For generic garbage collection, the Kubernetes documentation only seems to mention owner references and cron job based deletion.

@vincent-pli
Copy link
Member Author

Yes, the upper proposal is based on the owner references.
For the cron job, hmm, to be honest, is a choice, but not attractive...

@ncskier
Copy link
Member

ncskier commented Sep 9, 2019

I agree that we don't want to use the EventListener as the OwnerReference, because the EventListener is supposed to persist through multiple triggerings of resource creation.

So, I think that one cleanup mechanism could be to give every triggered resource a label using the $(uid) or an "event id." Then when a user wants to delete all of these resources, they can delete all resources with the $(uid) or "event id" label.

Another weirder option which might be a little heavy would be to create a new Event CustomResource. This Event resource would be created with all of the triggered resources, and would be the owner of all the triggered resources. So, if a user wanted to clean up all of the resources, they could delete the Event resource.

@vincent-pli do you think either of these cleanup options are adequate?

@vincent-pli
Copy link
Member Author

@ncskier Thanks for your comments.

Please take a look at my previous comments: #64 (comment)
It's similar to your second option.
I want to summary all the solutions here for further discussion:

  1. Add label to resources with the content of $(uid) or event id, then let user delete resource by label. It's simple for implements and use, but either $(uid) or event id is not good choice for label:
  • $(uid) it totally random, can not use regex for batch delete.
  • event id depends on event generator, maybe repeating.
  1. Set owner reference to EventListener
    The use case should be: user want to recreate the EventListener and want clear everything related to it.
    Its disadvantage is EventListener is design for long term running and support multiple triggerings of resource creation, cannot use it to runtime garbage collection.

  2. Create new crd, maybe named TriggerExecutor and make it as owner of all resource created in this event received. and deleted somewhen.

  • Manually
  • Monitor the status of Pipelinerun and delete TriggerExecutor if done.

I think the three options are not conflict, actually I think we can implement option 1 and 2 then document it.
For option 3, to be honest, I like this one, but as @ncskier said, a little heavy, but could really resolve the house keeping problem.
BTW, I think we can make it as option to let user to make decision.

@vtereso
Copy link

vtereso commented Sep 10, 2019

@vincent-pli Good point. One thing that may help however is that all labels cascade from the EventListener and as well as the additional label on the reconciled resources: labels["app"] = el.Name.
This way, any of these labels could be used for deletion.

@ncskier
Copy link
Member

ncskier commented Sep 10, 2019

Good point @vtereso, you could just use the label app=eventlistenername to delete all of the resources associated with that EventListener 👍

@vincent-pli
Copy link
Member Author

@vtereso I think it's a good idea, I will update the #96
For the option 3, I think we need more further discussion.

@imjasonh
Copy link
Member

imjasonh commented Sep 17, 2019

I think we want to consider some solution that persists the state/history of runs off-cluster, at which point it's safe to delete it from the cluster (or mark it for eventual later deletion).

The system for persistent off-cluster storage/indexing isn't designed yet (tektoncd/pipeline#453), but I believe it would solve this and many other problems in a CI/CD system, like tracking the success of a continuous build over time, searching for provenance of an image or deployment, etc.

This doesn't seem like a triggering-specific problem; PipelineRuns shouldn't be able to continuously accrete on a cluster indefinitely, regardless of whether they were triggered or manually run. I'd like to move this to the Pipelines repo since this is an issue with PipelineRuns/TaskRuns.

(However, that being said, if triggers are going to be capable of creating arbitrary K8s resources on-demand -- e.g., Jobs, ConfigMaps -- then triggering will have come up with some way of reaping those when it can determine they're unneeded. I don't think we can solve that the same way we can for PipelineRuns, since we know a lot more about the semantics of PipelineRuns.)

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2020
@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 11, 2020
@dibyom dibyom closed this as completed Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants