-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add liveness/readiness probes to web/task - fixes #414 #1188
Conversation
Hi @shanemcd how can we advance that PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We can consider aslo a liveness for awx-task container? A command like this We still discuss it on matrix with @TheRealHaoLiu to also find some solutions to roolback connection with postgre. |
added for task, what do you think about the defaults? |
I'm not sure about period, because command require some seconds (like 2 or 3) so i think that for the task we can use something like 10/15 seconds. |
- /bin/bash | ||
- -c | ||
- | | ||
awx-manage run_dispatcher --running | grep '\[\]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result exit code 1 in case of running jobs, as shown below
bash-5.1# awx-manage run_dispatcher --running
2023-02-01 11:40:24,449 WARNING [-] awx.main.dispatch checking dispatcher running for awx-8495cbdf8d-ph8gp
['02b0dcb3-3d12-4f37-9ce8-28ffb61bee6e', '9abcb670-4140-4a4f-b7df-5c68017ecbd3', 'aa18e470-14f5-4735-9a62-4d7383b3946e']
bash-5.1# awx-manage run_dispatcher --running | grep '\[\]'
2023-02-01 11:40:29,592 WARNING [-] awx.main.dispatch checking dispatcher running for awx-8495cbdf8d-ph8gp
bash-5.1# echo $?
1
So the probe will fail.
Maybe we should rely on status command awx-manage run_dispatcher --status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested "awx-manage run_dispatcher --status" and it seems to work.
returns 1 status when connection to database is refused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanganellilore can you explain how you stimulated a failover?
i did it by redirecting all traffic to database on pod to localhost using iptables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a Ha external postgres with vip address on leader. Simply stop the leader and waiting the election of replica (5/10seconds).
This jeans that connection is really dropped and it's required a new connection from awx-task.
Actually I'm trying to set a reconnection logic on awx, but is not very simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So every time the election process will accure the awx-task needs to reload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway, it seems we are not really sure which command to use to set in the liveness probe command.
maybe we need to mix them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can combine both command, like or condition. If one of the is 1, we can count it as broke container
To avoid Molecule destroying the environment run: |
@erz4 from the community meeting
|
i will help troubleshoot the CI failure |
@TheRealHaoLiu
|
@erz4 Re: nesting the variables, currently it shows like this:
We are hoping to nest these variables to declutter the AWX CR a bit.
When testing this out, it fails on the "Apply deployment resources" task, presumably because the probe timed out. The timeout may be too low. The timeout is 10 seconds and the database migrations take much longer than that to run. Probably 60-70 seconds if I had to guess. |
Hi @erz4 we are prioritizing to get this in next. due to the recent change to the deployment of awx (web-task-split) the PR need some heavy rebasing and update would u be able to get to this? |
There is an open PR actively being worked on here to implement this: |
This feature has been merged as part of #1674 |
SUMMARY
Added liveness & readiness probes to the awx-web container.
fixes #414
ISSUE TYPE
ADDITIONAL INFORMATION