Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add retries to find running web pod #1787

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

kurokobo
Copy link
Contributor

SUMMARY

Closes #1784

This PR adds retries to the task that tries finding running web pod. I don't have any ideas what values are the best for retries and delay, but I believe 2 minutes is enough until the web pod to be running.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
ADDITIONAL INFORMATION

Tested locally by deploying following minimal AWX:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  namespace: awx
  name: awx-demo
spec:
  service_type: nodeport

Without this PR (as 2.13.1 is), finding runing web pod is failed since there is no wait after deployment of web pod. In this case, web pod is still in ContainerCreating due to pull images, for example.

$ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -E "^PLAY RECAP" -A 1
PLAY RECAP *********************************************************************
localhost                  : ok=91   changed=19   unreachable=0    failed=1    skipped=46   rescued=0    ignored=0   👈👈👈
--
PLAY RECAP *********************************************************************
localhost                  : ok=63   changed=0    unreachable=0    failed=1    skipped=72   rescued=0    ignored=0   👈👈👈
--
PLAY RECAP *********************************************************************
localhost                  : ok=90   changed=14   unreachable=0    failed=0    skipped=81   rescued=0    ignored=2   
--
PLAY RECAP *********************************************************************
localhost                  : ok=88   changed=0    unreachable=0    failed=0    skipped=83   rescued=0    ignored=1   
--
PLAY RECAP *********************************************************************
localhost                  : ok=88   changed=0    unreachable=0    failed=0    skipped=83   rescued=0    ignored=1 

Deploy custom Operator including this PR:

IMG=registry.example.com/ansible/awx-operator:wait_web BUILD_ARGS="--build-arg DEFAULT_AWX_VERSION=24.0.0" make docker-build docker-push deploy

Deployment of above minimal AWX is completed in the first loop

$ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -E "^PLAY RECAP" -A 1
PLAY RECAP *********************************************************************
localhost                  : ok=118  changed=33   unreachable=0    failed=0    skipped=56   rescued=0    ignored=2   
--
PLAY RECAP *********************************************************************
localhost                  : ok=88   changed=0    unreachable=0    failed=0    skipped=84   rescued=0    ignored=1 

Also tested with the CR that contains web_*ness_period:

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  namespace: awx
  name: awx-demo
spec:
  service_type: nodeport
  web_readiness_period: 15
  web_liveness_period: 15

With the Operator that includes both this PR and #1786, the CR can be deployied without any failure.

$ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -E "^PLAY RECAP" -A 1
PLAY RECAP *********************************************************************
localhost                  : ok=118  changed=33   unreachable=0    failed=0    skipped=56   rescued=0    ignored=2   
--
PLAY RECAP *********************************************************************
localhost                  : ok=88   changed=0    unreachable=0    failed=0    skipped=84   rescued=0    ignored=1   

Note, in the implementation in this PR (and before wait is removed), if any one of the three containers in the web pod is running, the task succeeds and moves on to the next task.
If we want to make sure that all three containers are strictly running, we will need to implement additional logic.

@rooftopcellist rooftopcellist merged commit 07b8120 into ansible:devel Mar 27, 2024
6 checks passed
@kurokobo kurokobo deleted the wait_web branch April 1, 2024 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants