Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment Proposal: Tekton Gates #144

Closed
iancoffey opened this issue Jul 8, 2020 · 15 comments
Closed

Experiment Proposal: Tekton Gates #144

iancoffey opened this issue Jul 8, 2020 · 15 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@iancoffey
Copy link
Member

iancoffey commented Jul 8, 2020

This is a proposal for a new experimental component to provide a new concept for gates - that is, gating pipeline execution at critical points until criteria is met. Complex pipelines often need to stop and await certain criteria before proceeding. Unlike conditions, gates are excepted to take at least N time duration before proceeding - eg, before promotion step verify that N metric is at or above X threshold for Y hours before proceeding.

Using a dedicated concept will make logical time accounting possible. A pipeline with a defined gate should not be timing out (or accruing time against the timeouts) in the same way that any other pipeline steps or conditions should. A "gate" is configured to wait and gather metrics for at least N duration to pass, so it does not make sense to count this time against the pipeline or step timeout.

Like a step or condition, a gate is an interface where any image is used, but my test implementation will use my prometheus-gate experiment image. This will allow any pipeline to await results from any valid rangequery for a period of time before proceeding past. In this way, any service that can provide metrics via Prometheus can await defined min/max/equality values inside Pipelines.

@bobcatfish
Copy link
Contributor

Hey @iancoffey ! This sounds very relevant to the custom task work that @imjasonh has been working on: https://github.com/tektoncd/community/blob/master/teps/0002-custom-tasks.md, e.g. this use case sounds very similar (waiting until some condition is met) - I wonder if your proposal could make use of custom tasks?

@iancoffey
Copy link
Member Author

@bobcatfish Oh awesome! I had not yet noticed these custom tasks, but I will check into how this concept fits. I think the sharp edge will be around time accounting and managing timeouts correctly when a pause/gate is implemented.

@imjasonh
Copy link
Member

imjasonh commented Jul 8, 2020

@iancoffey This definitely seems like something we should support using Custom Tasks.

I agree that PipelineRun timeout enforcement is a bit of an unknown at the moment. I think one easy way to address this today is just to set a Pipeline's timeout as something really long, like a week or more -- there's probably still some amount of time after which the PipelineRun should be considered expired/invalid if it's still waiting for a gate to unlock to proceed. We could also just allow timeout-less Pipelines that "run" indefinitely -- this becomes a lot easier to imagine when we're talking about "runs" that aren't just container executions.

Anyway, looking forward to hearing more feedback and ideas. Let me know if you'd like to walk through some of the demo tasks I've written to get ideas flowing.

@iancoffey
Copy link
Member Author

iancoffey commented Jul 9, 2020

Im going to contribute this as a catalog Task to start and also see about creating a custom task based on it.

@imjasonh I would def like to hear about the demo custom tasks youve written, that would be 💯

@imjasonh
Copy link
Member

imjasonh commented Jul 9, 2020

@imjasonh I would def like to hear about the demo custom tasks youve written, that would be 💯

The two that are most stable (i.e., least complex) so far are wait-task, which takes duration param and simply waits that long before succeeding, and cel-task, which takes an expression param and tries to evaluate it, succeeding if the result of evaluation is true. cel-task will be especially useful when we support custom tasks as conditions.

I'm working on a very basic approval-task that blocks until someone POSTs to some endpoint, and a gcb-task that runs a remote Google Cloud Build with the build config specified in a CRD, and parameterized source context. Those are a bit more complex so they'll probably take a bit longer to stabilize.

@tekton-robot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 15, 2020
@tekton-robot
Copy link
Contributor

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vdemeester
Copy link
Member

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

@tekton-robot
Copy link
Contributor

@vdemeester: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/remove-lifecycle stale
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Aug 17, 2020
@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020
@tekton-robot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2020
@tekton-robot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 15, 2020
@vdemeester
Copy link
Member

/remove-lifecycle rotten

@bobcatfish
Copy link
Contributor

@vdemeester i'm wondering if it makes sense to let this close? I'm not sure @iancoffey is working on it anymore? and it seems like custom tasks are the way we want to go forward here

@iancoffey
Copy link
Member Author

iancoffey commented Dec 15, 2020

I still do plan to create a custom task for this at some point, but it has not happened yet. I did implement similar more scoped blocking logic as a catalog task tho! https://github.com/tektoncd/catalog/tree/master/task/prometheus-gate/0.1.

I will close this, and bring it back up when I get around to the custom task. thanks!

dlorenc pushed a commit to dlorenc/community that referenced this issue Oct 27, 2022
Signed-off-by: Hayden Blauzvern <[email protected]>

Signed-off-by: Hayden Blauzvern <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants