Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Tekton Pipeline #5452

Open
5 tasks
pritidesai opened this issue Sep 7, 2022 · 8 comments
Open
5 tasks

Optimize Tekton Pipeline #5452

pritidesai opened this issue Sep 7, 2022 · 8 comments
Labels
area/performance Issues or PRs that are related to performance aspects. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@pritidesai
Copy link
Member

pritidesai commented Sep 7, 2022

Tekton Pipeline has matured since the inception but at the same time, the project is under active development. Many organizations have adopted Tekton Pipelines for various use cases. For the project at this level of maturity and use, reliability must be maintained. The users should be able to upgrade their pipelines to the latest release without running into any performance degradation.

We have noticed a couple of issues reported with a similar concern around efficiency - webhook timing out or the cluster is not responsive for a pipeline with large number of tasks.

Today, we have no records of how much time a certain pipeline takes to execute with the latest release compared to N number of the past releases.

We have had a couple of PRs in the past trying to introduce some form of performance test:

Let's start writing performance tests to report the execution time. The performance tests can be scheduled to execute every night. We collect the execution time in logs for now until we come up with a better way of storing these numbers.

As a performance measure, we could also avoid validating task/pipeline spec in every iteration - #4562.

  • Determine if we can avoid validating specifications every reconcile cycle.
  • Create a test with a complex pipeline to log time taken to validate.
  • Write a test to create a pipelineRun with a complex pipeline (multiple tasks with taskRef and taskSpec along with many whenexpressions).
  • Create multiple taskRuns in parallel - something similar to RFC: Basic performance test #4378
  • Create multiple pipelineRuns in parallel and log timing.
@pritidesai pritidesai added the area/performance Issues or PRs that are related to performance aspects. label Sep 7, 2022
@dibyom
Copy link
Member

dibyom commented Sep 14, 2022

This seems similar to tektoncd/community#602

@JeromeJu
Copy link
Member

/assign

@afrittoli
Copy link
Member

From the pipeline WG - it would be good to break this down in smaller items we can target to milestones.

@JeromeJu JeromeJu removed their assignment Jan 17, 2023
@lbernick
Copy link
Member

It seems like this issue is largely scoped to benchmarking. I did a bit recently to test out a feature I was working on and want to share my progress here.

I wrote some scripts that generate N copies of a PipelineRun, wait until all N are complete, and write timing info to a file. If the script is cancelled, it will cancel any currently running PipelineRuns and report on all of them regardless of whether they have completed. This could be a good starting point for anyone who wants to implement benchmarking. Code changes are on the branch https://github.com/lbernick/pipeline/tree/perftest.

Some things that still need to be figured out:

  • Where would we run this? We wouldn't want to generate lots of runs on our CI cluster, and I'm not sure if a kind cluster could handle a large number of PipelineRuns.
  • What's the best way to output perf data so that it can be stored over time and referred to easily? We might be able to run tests using Tekton, and store results using Tekton results; we could also use prometheus but I'm not sure how we'd easily separate out metrics related to benchmarking.
  • What metrics do we care about specifically? I think this is what Start measuring Tekton Pipelines performance #540 is trying to tackle (also some more detail in https://docs.google.com/document/d/1Rme6UQ0i03W_Fg3pefJ8aJ9G73IBnijUISTL2R9XmzU/ -- thanks @pritidesai!)

Some other tools we can look into:

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 10, 2023
@vdemeester
Copy link
Member

/lifecycle frozen

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 10, 2023
@afrittoli
Copy link
Member

@pritidesai - we marked this as "nice to have" for v1 - please let us know if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Issues or PRs that are related to performance aspects. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Status: Todo
Development

No branches or pull requests

8 participants