-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sleep 650 job throwing error within 5 minutes of its launch #13161
Comments
Hello - where is your Kubernetes running? Azure by chance? See #12530 (comment) In addition to the suggestions in that comment, we're also currently working on a PR to Receptor that might help this ansible/receptor#683 |
this might be related for context #12530 (comment) |
@TheRealHaoLiu @shanemcd Thank you!! I'm running it on AKS, yes! The one you mentioned matches with issue pattern. But another strange issue is that the job would fail randomly after executing for 1.5 hours or so, without any proper error log and stdout stops working in similar fashion. |
| job would fail randomly after executing for 1.5 hours or so this sounds like log rotation issue -- this could be mitigated by increasing the max container log size in your k8s configuration, see this comment on how I did it on minikube and you can follow a similar solution this also should be addressed with ansible/receptor#683 |
Hi @fosterseth Thank you for suggestion. However, I did more investigation and found the log rotation actually wasn't the roadblock for AWX job - but it's still the 5 minutes timeout somewhere (per earlier comments, Konnectivity module in AKS - which I'm still trying to find workaround for).
For the logs, I have created a new nodepool with larger container log size, but what do we do with this 5 minutes timeout? It's getting really frustrating for users. |
@deep7861 |
I have the latest version of AWX 21.12.0 running in OKE and I have the same problem. What is the procedure to implement the solution proposed in ansible/ansible-runner#1187 ?? |
Please confirm the following
Bug Summary
With default job timeout of 0 and job idle timeout of 0, putting a task with 'sleep 650' is making the job fail with no much explanation.
Idle timeout 0 should mean 600 seconds timeout, but this job is failing within 5 or so minutes. Here is the stdout:
PLAY [Play to test sleep function] *********************************************
TASK [date] ********************************************************************
changed: [localhost]
TASK [debug] *******************************************************************
ok: [localhost] => {
"date_before.stdout": "Sun Nov 6 17:50:11 UTC 2022"
}
TASK [sleep] *******************************************************************
API showing this error reason:
AWX version
21.2.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
2.12
Operating system
No response
Web browser
Chrome
Steps to reproduce
Simple playbook with single task of 'command: sleep 650'.
Launch and see it erroring out within 300-350 seconds. Error reason in API: Job terminated due to error.
Stdout getting stuck at sleep task, meaning, not showing the playbook summary results.
Expected results
Task and playbook should run smooth and fail with idle timeout error, showing all logs meaningfully
Actual results
Playbook erroring out with no logs
Additional information
No response
The text was updated successfully, but these errors were encountered: