Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The documentation of kubectl wait is a bit miss-leading #754

Closed
eranreshef opened this issue Nov 3, 2019 · 19 comments
Closed

The documentation of kubectl wait is a bit miss-leading #754

eranreshef opened this issue Nov 3, 2019 · 19 comments
Assignees
Labels
area/kubectl kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@eranreshef
Copy link

The --timeout option for kubectl wait says:

"The length of time to wait before giving up" .

I personally understand this as the timeout for the entire command but after experimenting with it a bit I realised that the value of this option is per resource. For example, if I am waiting for a set of 10 pods to be Ready and used --timeout=60s, I might end up waiting for 10 minutes before the command exits, and not just 1 minute as I assumed.

So as I see it, there are 2 possible solutions here:

  1. Improve the documentation.
  2. Change the implementation so that the --timeout value will be for the entire command duration (preferable solution in my opinion).
@zhouya0
Copy link
Contributor

zhouya0 commented Nov 7, 2019

Actually the kubectl wait will do the watch request for every single resource and do the request one by one:

GET https://10.6.192.3:6443/api/v1/namespaces/default/pods?fieldSelector=metadata.name%3Dmy-busybox&resourceVersion=5594455&watch=true

But watch request is actually a unlimited connection so every request has a timeout(default is 30s in wait).

So the only way to do your second solution is average time for the timeout, but seems not very good to do that.

@eranreshef
Copy link
Author

I implemented sort of a wrapper for internal use with the following logic:

  1. run kubectl wait <resources> --timeout=0
  2. parse the output from step 1 to get a list of unready_pods
  3. while (command_elapsed_time < command_timeout and len(unready_pods) > 0)
    a. run kubectl wait <unready_pods> --timeout=0
    b. parse the output from step 3.a to get a list of unready_pods
  4. if len(unready_pods) > 0
    a. raise a command_timeout exception.

IMHO, this logic fits better to the use cases kubernetes users encounter on the day to day work but again, my opinion only. I'm not a go developer so I can't submit a PR with this logic but I would definitely like to hear more opinions about this logic.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 5, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 6, 2020
@pjh
Copy link

pjh commented Mar 31, 2020

/remove-lifecycle rotten

I just experienced exactly the issue described in the OP, and found it very confusing. I noticed these log messages from my e2e test:

+ kubectl wait --for=condition=ready pod -l prepull-test-images=e2e --timeout 30m
timed out waiting for the condition on pods/prepull-test-containers-gxjhh
timed out waiting for the condition on pods/prepull-test-containers-mqf4j
timed out waiting for the condition on pods/prepull-test-containers-r96tp
+ kubectl get pods -o wide
NAME                            READY   STATUS             RESTARTS   AGE   IP          NODE                                           NOMINATED NODE   READINESS GATES
prepull-test-containers-gxjhh   14/17   CrashLoopBackOff   85         92m   10.64.2.3   e2e-726b009feb-d872a-windows-node-group-ltb5   <none>           <none>
prepull-test-containers-mqf4j   14/17   CrashLoopBackOff   84         92m   10.64.1.3   e2e-726b009feb-d872a-windows-node-group-0qdq   <none>           <none>
prepull-test-containers-r96tp   14/17   CrashLoopBackOff   82         92m   10.64.3.3   e2e-726b009feb-d872a-windows-node-group-lc19   <none>           <none>

The age of the pods is 90 minutes rather than the 30 minutes I expected, because kubectl wait is applying the timeout to each selected pod sequentially. This is very unintuitive to me. If this default behavior can't be changed, perhaps a flag can be added to change the behavior to enforce the timeout across all selected pods/conditions.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 31, 2020
benmoss pushed a commit to benmoss/windows-testing that referenced this issue Mar 31, 2020
We discovered that kubectl wait will actually timeout only after
timeout*number of objects being waited on. So if you wait on 3 pods
specifying a timeout of 10m, it will actually wait 30m.

`kubectl sleep --timeout -1s` will wait a week, so plausibly "forever".

kubernetes-sigs#158
kubernetes/kubectl#754
@brianpursley
Copy link
Member

/kind bug

@seans3
Copy link
Contributor

seans3 commented Jul 22, 2020

/area kubectl
/sig cli

@k8s-ci-robot k8s-ci-robot added area/kubectl sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Jul 22, 2020
@eddiezane
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jul 22, 2020
@JabusKotze
Copy link

/assign

@carlossg
Copy link

carlossg commented Sep 28, 2020

This is easy to see with a small timeout if you have more than one resource

For example with a 2s timeout it takes approximately number of pods * 2s to timeout, not 2s

Also note that there is no output until all of them timeout, which also adds up to the confusion

$ kubectl get pods -l release=myrelease -o name | wc -l
      17

$ time bash -c "kubectl wait --for=delete pods -l release=myrelease  --timeout=2s 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'"
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-c66pr
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-8tgq6
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-gx5mn
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-6xb9b
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-s9cbv
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-qtwx7
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-qk842
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-l6wq2
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-zv25k
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-diss0
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-mh8h2
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-cb2zn
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-q879p
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-skggg
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-4mklb
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-pn94r
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-r2226
kubectl wait --for=delete pods -l release=myrelease --timeout=2s  0.23s user 0.09s system 0% cpu 40.267 total

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2020
@eranreshef
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 27, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 26, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mgabeler-lee-6rs
Copy link

/reopen

This seemed to be a confirmed bug, and is definitely still present, quite confusing, and makes it basically impossible to use kubectl wait for more than one resource in any useful fashion

@k8s-ci-robot
Copy link
Contributor

@mgabeler-lee-6rs: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

This seemed to be a confirmed bug, and is definitely still present, quite confusing, and makes it basically impossible to use kubectl wait for more than one resource in any useful fashion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mgabeler-lee-6rs
Copy link

ok, fine, #1219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubectl kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
None yet
Development

No branches or pull requests