Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Basic performance test #4378

Closed
wants to merge 1 commit into from

Conversation

guillaumerose
Copy link
Contributor

I am trying to write performance test for Tekton. I am looking at finding good metrics to follow and good test scenario.
Here is one: I create N taskRuns per second and observe how long it takes for the controller to create the pod related to each taskRun.

At the beginning, I thought it will be done in few seconds but even at a small rate (2-3 taskRuns per second), after 3min, the controller takes 30 seconds between taskRun creation and pod creation. It seems to be better if I tune QPS and burst of the k8s client of the controller.

I hope I can turn this into a real test that can run in the CI. Let me know what you think, what I should improve!

@tekton-robot
Copy link
Collaborator

@guillaumerose: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 17, 2021
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign afrittoli
You can assign the PR to them by writing /assign @afrittoli in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 17, 2021
@imjasonh
Copy link
Member

@guillaumerose Thanks for picking this up and especially for your interest in formalizing and automating the tests. I've definitely done similar load tests in the past without observing the slowness you're seeing, so we must have had a regression at some point. Running these tests regularly would help prevent those in the future. 👍

@vdemeester
Copy link
Member

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 17, 2021
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @imjasonh, I think it would be nice to have those running alongside with the nightly build for example. We can start with this very simple test and grow it iteratively.

@bobcatfish
Copy link
Collaborator

I have a couple of requests:

  1. That we prioritize setting up the automation to run these tests on a nightly basis (and make results visible) - if we're not able to do that I wouldn't want to add tests to the codebase that we don't exercise (this is why we didnt end up merging Initialize load testing #3554 )
  2. I think it would be good to identify the scenarios we want to exercise in the tests up and agree up front before merging, TEP-0036 tries to start the process of defining what we need, @guillaumerose if you're at all interested in picking up that TEP and making a proposal around it you are more than welcome :D

@guillaumerose
Copy link
Contributor Author

Thanks all!

@bobcatfish I will add my thoughts in a PR in this TEP, then we can come back to this 👍

@vdemeester
Copy link
Member

  1. That we prioritize setting up the automation to run these tests on a nightly basis (and make results visible) - if we're not able to do that I wouldn't want to add tests to the codebase that we don't exercise (this is why we didnt end up merging Initialize load testing #3554 )

@bobcatfish this is a bit chicken-and-egg problem here. We need something to execute on to setup the nightly runs, hence having this in would be first step, 2nd step would be to setup a nightly in dogfooding, and the steps after would be to refine and enhance.

rateLimitedCfg.RateLimiter = flowcontrol.NewTokenBucketRateLimiter(qps, burst)
rateLimitedKubeClient, rateLimitedCS := clientSets(t, rateLimitedCfg)

wg := sync.WaitGroup{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's some ongoing discussion about what exactly to benchmark, but I recently discovered testing.B and I'm wondering if it could be helpful here? It looks like some of timing + waitgroup logic could be replaced with b.runParallel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I though @guillaumerose used testing.B.., but turns out not. Yes, maybe using it might make sense indeed.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2022
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 22, 2022
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants