-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to JSON parse a line from worker stream due to unexpected EOF(b'') #14693
Comments
Hello, we'll need some further information. Can you please set the following settings AWX_CLEANUP_PATHS = False and RECEPTOR_RELEASE_WORK = False. Then check the /tmp/awx_<job_id>_/artifacts/<job_id>/job_events directory. Confirm all the files there are JSON. If one isn't, please report back which one(s). This should all be done in the EE container in the task pod. Please also share the logs for the job pods. We're trying to confirm how far the job was able to run before failure. |
can u also provide the receptor log and what kind of kubernetes you are using |
can you retry with the latest EE image for the controlplane EE... u can do this by changing the imagepullpolicy in awx to Alaways than switch it back to IfNotPresent |
Kubernetes v1.24.2 self managed by kubeadm
I achieved the desired configuration by editing the AWX object. Specifically, I added the extra settings in the YAML file under the 'spec' section. The adjustments were as follows: extra_settings:
- setting: RECEPTOR_RELEASE_WORK
value: "False"
- setting: AWX_CLEANUP_PATHS
value: "False" |
The directory is empty: kubectl exec -it awx-task-78cbf7c589-bzgd8 -c awx-task -- bash
bash-5.1# ls tmp/awx_7976_m5_ejtdl/artifacts/7976/job_events/
bash-5.1# Here you also have the logs from job pod: automation-job-7976-z6vnr.log |
your provided job output log looks good at first glance. Does the UI job output stdout page show all of those events? where are you seeing can you provide a screenshot of it? |
In the provided screenshot, line 6950 appears as the final line visible within the user interface. As a next step, I plan to conduct an experiment by deploying the same AWX setup on EKS, as opposed to using my self-managed Kubernetes cluster. BTW, I found 2 more users with the same issue: https://www.reddit.com/r/awx/comments/176za7y/issue_with_json_parsing_error_in_awx_2320_on/ |
Same issue. Cluster version v1.24.12+rke2r1, AWX 23.3.1, the error occurs randomly, yesterday everything worked on custom Execution Environments UPD: Hope this helps you |
Hello @aryklein thank you for providing those additional screenshots. Could you go to Settings> TroubleShooting Settings> Edit> From here, turn off temp dir cleanup and receptor release work And then, in the control plane ee container (this is located in the task pod) get the And also get the You should be able to get the work unit id from the API for that job run. Please provide us with the artifacts directory and the stdout file. |
@djyasin I think I did it here right? |
I got this error as well. "Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b'' AWX 23.4.0 I no idea what is causing this. Logs doesn't tell me anything. It will occur at different job. If anyone got idea on how to troubleshoot this , please advise me. Thanks. |
+1 My details: Logs are the same in my case. It looks like that tasks are completed, but AWX fails when read status from execution environment. Also when Verbosity is set to debug tasks are sucessful, but not always (~70% chances) |
I recently migrated my AWX deployment to EKS, with Kubernetes version 1.25, and the issue has completely vanished |
Same issue on k3s v1.28 Edit: it seems to be resolved after updating inotify:
|
Same issue with : |
Im experiencing the same issue in 23.3.0 and K8s 1.24. I will dig deeper to troubleshoot more and capture some logs after the holidays. |
Thanks! I was on 1.27 and inotify increases didn't change a thing. Updating to 1.28 helped. |
hi, i have same error on awx 23.5.1. Template OUTPUT: i try with more different release off EE, but problem same. |
@mattiaoui Upgrade your K3s to 1.28. Then follow this tip: |
Many Thanks,after upgrade all template work 🤟🤟. |
In my case, temporary "fix" was setting Debug(3) in Verbosity on all my templates. After this change, all scheduled tasks was successful for about a week. But during upgrade to k8s 1.29 I noticed that is issue with kube-proxy on node with awx, because it was still in CrashBootloop state. But it is basically the same as @marek1712 mentioned, but with other values. |
I think I have found the issue. It seems that all tasks controller pods are running 4 containers which are redis, task(awx image), rsyslog(awx image) and ee( awx-ee image). I did notice that ee container in the pods were running awx-ee:latest. I changed that deployment to use the same awx-ee version as the awx deployment and it resolved the problem for me. i think it is something with the latest image running as a controller. I did not make any changes to my sysctl inodes or anything else. I am running version 23.2.0 and everything is working perfectly after matching the ee container to my running awx version. I suspect it might be a bug introduced in the ansible/awx-ee . I do see it was not updated since Jun and in Nov and Dec they had some updates. |
@chinochao it seem like a bug in ansible runner (a python package awx would use it ), btw, which awx-ee version do you use? |
I am using 23.2.0 for awx and awx-ee in the deployment. For the EE in the AWX UI, I have latest configured. It seems the issue is using awx-ee as latest in the controller task containers. |
I set control_plane_ee_image and Execution Environment ee image with same version, and try both 23.2.0 and 23.3.1. The execution still fails with the json parse error Error probabilities occur when job contains a lot of hosts |
Can you provide the output from kubectl to see and make sure the awx-ee image is not using latest? Something like kubectl describe to one of the task containers. |
Looks like this issue is recolved for us as well, thanks @TheRealHaoLiu ❤️ |
hi @TheRealHaoLiu , my AWX
i use a custom control panel ee image build on https://github.com/ansible/awx-ee
|
I set it fixed the issue for roughly a day then it came back? strange |
I can replicate this "Failed to JSON parse a line from worker stream." issue very consistently when running job templates, with an awx-operator managed install of AWX, on EKS v1.22.17 (I know, it's old). I also have problems syncing inventory from a Git managed project -- some binascii / base64 padding errors. Both of these problems go away when I change nothing other than deploying to a local Kind cluster running v1.29.1 |
@mcapra enabling RECEPTOR_KUBE_SUPPORT_RECONNECT only works on these kubernetes versions:
make sure to NOT enable that if you are on 1.22.17 |
One more note from our side: The message Just in case you would like to get rid of all instances of this error message/you´d like to inform the user about this problem with a different message |
If you have the error and are using a custom AWX-EE image. We easily reproduce this issue but only during facts gather when we have multiple endpoint to connect. |
I'm not sure if I'm doing something wrong, but I still get this error.
This: EDIT - just ran this: Any idea what to do next? |
@marek1712, it's possible that the deployment isn't updating. few items as food for thought:
I'm also running k3s, although 1.26.13 |
I am having the same problem even after replacing "latest" tag with verison tag (23.8.1) I am using rke2 cluster and have setup awx with helm chart. kubernetes version - 1.29.2 |
@Satyam777-git - did you enable RECEPTOR_KUBE_SUPPORT_RECONNECT? What's your container-log-max-size (the latter should be only a workaround)? |
we gotta start pinning awx-ee image in the release... |
ansible/awx-operator#1740 |
Just out of curiosity, are there plans to change the "latest" reference from awx-ee to receptor as well? |
Understanding that not everybody's environment is the same, is it fair to say that the baseline intended resolution for those experiencing this issue is to:
And that if one were to start fresh with a new AWX environment, images would now be bound to the DEFAULT_AWX_VERSION as opposed to latest, and enabling RECEPTOR_KUBE_SUPPORT_RECONNECT would be still be required as part of the fix, (due to the reasons described here: #11805 (comment) and here: ansible/awx-operator#1484) but should only be enabled on a case by case basis? We have a playbook that is responsible for moving many terabytes of data around from time to time, and while it doesn't fail after 4 hours as mentioned in #11805, it does generate a lot of output if we don't disable logging from rclone with the no-log flag, which caused us to run into this issue. We are running k3s v1.28.5, with AWX 23.5.1 and AWX operator 2.9.0. Thank you! |
Added RECEPTOR to true but still getting error. Do I need to specify all the latest image? Anyone has steps on how to view the full error log? Thanks. |
@mxs-weixiong please provide the result_traceback from /api/v2/jobs/<job_id> of the fail job |
Setting web and task replicas to 1 fixed this issue for me |
It could be memory consumption issue: #15273 |
I have apply this solution but there have no changes. If this is bug any solution else or do we have older stable version ? Hi do we have any s |
it could be memory consumption issue, after applied this config , the issues gone.
|
@kevrrnet nice to hear it works for you. Do you know what the defaults are for the values you changed here? |
The implementation in the serverfault indicated link was resolved this issue for me! |
update on my previous post: |
any update on this issue? any suggestion ? |
+1 |
Please confirm the following
[email protected]
instead.)Bug Summary
I've encountered an issue with some of my jobs in AWX 23.3.1, and I’m hoping to gather insights or solutions from anyone who might have faced a similar problem.
Error Message
The execution fails with an error message on the details tab:
Environment
AWX Version: 23.3.1 Deployment Platform: Kubernetes Operator Version: 2.7.2
I'm inclined to think that this issue stems from a UI error during the process of logging data from the pod, although I'm not entirely certain.
Any Idea?
AWX version
23.3.1
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
ansible [core 2.16.0]
Operating system
Linux
Web browser
Firefox, Chrome
Steps to reproduce
Randomly at executing some templates
Expected results
Finish the Ansible playbook execution without errors
Actual results
The Jobs don't fail, they just finish the execution with error:
Additional information
I tried using different Ansible EE version (included the latest one)
The text was updated successfully, but these errors were encountered: