-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterizing sdn to ovn migration timeout #42672
Parameterizing sdn to ovn migration timeout #42672
Conversation
[REHEARSALNOTIFIER]
A total of 67 jobs have been affected by this change. The above listing is non-exhaustive and limited to 35 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-gcp-ipi-sdn-migration-ovn-f14 |
/assign @jtaleric |
/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.12-amd64-nightly-gcp-ipi-sdn-migration-ovn-f14 |
/joke |
@vishnuchalla: What's the best thing about elevator jokes? They work on so many levels. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@vishnuchalla: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign @jluhrsen |
@vishnuchalla I guess you want to increase the timeout when running this script in a large cluster. May I ask what is the size of the cluster? And at which step the current timeout is not big enough? |
We have cluster with 120 worker nodes on which we plan to trigger some large scale tests. |
/pj-rehearse ack |
/lgtm |
It seems like it would be better to just count the number of nodes at the top and then adjust the timeouts based on that? (eg |
That might work in some cases and we cannot guarantee it to be stable across all types of runs and we might often re-visit this part of code to change that fixed value if something breaks. Instead I think we should have a variable exposed as in this PR, so that users can have flexibility to experiment and decide on the right number based on their use case. That way the code will have minimal changes and provides flexibility at the user level to set their desired value on timeouts. |
if this variable is set to some value (non zero) then that is the timeout used for all steps, whereas before the That could really blow up on us if there is an issue. would it be overkill to add specific timeouts for each |
Yes defaults will stay in place. This change is to provide users flexibility to modify timeout according to their use case. If they set a lower value its their own set failure, but if they set it to a greater one then it will be beneficial. And also I don't think that the user who is going to set this timeout value is going to set it without totally being aware of where and how it is being used. And also if someone reaches out with a failure reason due to timeout, we can simply redirect them to adjust |
gotcha, and it makes sense. I like the idea. my only concern is that I would assume that if someone is going |
Yes, Even if the value is larger timeout is just a timeout. As a process need not wait until the timeout to finish and as its just a limit on execution time, I think it should be fine. |
/lgtm |
1 similar comment
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jluhrsen, jtaleric, pliurh, vishnuchalla The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adding a generic timeout for the ovn-sdn-migration ref as it is resulting in some of the jobs failures. More context here.
This option should provide us a flexibility to decide on a generic timeout for all the steps involved. If not set, It will run with the default values.