Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to JSON parse a line from worker stream due to unexpected EOF(b'') #14693

Open
5 of 11 tasks
aryklein opened this issue Nov 29, 2023 · 130 comments
Open
5 of 11 tasks

Comments

@aryklein
Copy link

aryklein commented Nov 29, 2023

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

I've encountered an issue with some of my jobs in AWX 23.3.1, and I’m hoping to gather insights or solutions from anyone who might have faced a similar problem.

Error Message

The execution fails with an error message on the details tab:

Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b'

Environment

AWX Version: 23.3.1 Deployment Platform: Kubernetes Operator Version: 2.7.2

I'm inclined to think that this issue stems from a UI error during the process of logging data from the pod, although I'm not entirely certain.

Any Idea?

AWX version

23.3.1

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

ansible [core 2.16.0]

Operating system

Linux

Web browser

Firefox, Chrome

Steps to reproduce

Randomly at executing some templates

Expected results

Finish the Ansible playbook execution without errors

Actual results

The Jobs don't fail, they just finish the execution with error:

Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b'

Additional information

I tried using different Ansible EE version (included the latest one)

@jessicamack
Copy link
Member

Hello, we'll need some further information. Can you please set the following settings AWX_CLEANUP_PATHS = False and RECEPTOR_RELEASE_WORK = False. Then check the /tmp/awx_<job_id>_/artifacts/<job_id>/job_events directory. Confirm all the files there are JSON. If one isn't, please report back which one(s). This should all be done in the EE container in the task pod. Please also share the logs for the job pods. We're trying to confirm how far the job was able to run before failure.

@TheRealHaoLiu
Copy link
Member

can u also provide the receptor log and what kind of kubernetes you are using

@TheRealHaoLiu
Copy link
Member

can you retry with the latest EE image for the controlplane EE... u can do this by changing the imagepullpolicy in awx to Alaways than switch it back to IfNotPresent

@aryklein
Copy link
Author

aryklein commented Dec 1, 2023

can u also provide the receptor log and what kind of kubernetes you are using

Kubernetes v1.24.2 self managed by kubeadm
How could I get the receptor log?

I'm Apologies for my limited knowledge on this subject. I attempted to include AWX_CLEANUP_PATHS=False and RECEPTOR_RELEASE_WORK=False in the configMap, but couldn't locate the appropriate section for their addition. Consequently, I chose to input them in the web UI, specifically under Settings, within the Job section, and in Extra Environment Variables. Could you please confirm if this approach is correct?

I achieved the desired configuration by editing the AWX object. Specifically, I added the extra settings in the YAML file under the 'spec' section. The adjustments were as follows:

extra_settings:
   - setting: RECEPTOR_RELEASE_WORK
     value: "False"
   - setting: AWX_CLEANUP_PATHS
     value: "False"

@aryklein
Copy link
Author

aryklein commented Dec 1, 2023

@jessicamack,

Then check the /tmp/awx_<job_id>_/artifacts/<job_id>/job_events directory. Confirm all the files there are JSON.

The directory is empty:

kubectl exec -it awx-task-78cbf7c589-bzgd8 -c awx-task  -- bash                                                                                                                                                                               
bash-5.1# ls tmp/awx_7976_m5_ejtdl/artifacts/7976/job_events/
bash-5.1# 

Here you also have the logs from job pod: automation-job-7976-z6vnr.log

@fosterseth
Copy link
Member

your provided job output log looks good at first glance. Does the UI job output stdout page show all of those events?

where are you seeing Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b' exactly in the UI?

can you provide a screenshot of it?

@aryklein
Copy link
Author

aryklein commented Dec 6, 2023

In the provided screenshot, line 6950 appears as the final line visible within the user interface. As a next step, I plan to conduct an experiment by deploying the same AWX setup on EKS, as opposed to using my self-managed Kubernetes cluster.

BTW, I found 2 more users with the same issue: https://www.reddit.com/r/awx/comments/176za7y/issue_with_json_parsing_error_in_awx_2320_on/

2023-12-06_13-54-01
2023-12-06_13-55-04

@Dodexq
Copy link

Dodexq commented Dec 8, 2023

Same issue. Cluster version v1.24.12+rke2r1, AWX 23.3.1, the error occurs randomly, yesterday everything worked on custom Execution Environments

UPD:
When I set env ansible/receptor#683
ee_extra_env: |
- name: RECEPTOR_KUBE_SUPPORT_RECONNECT
value: disable
the error became complete and it: Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b'failed to create fsnotify watcher: too many open files after that I set the value to x3 (from installed by default) from the previous https:// serverfault.com/questions/1137211/failed-to-create-fsnotify-watcher-too-many-open-files and everything worked :)

Hope this helps you

@djyasin
Copy link
Member

djyasin commented Dec 13, 2023

Hello @aryklein thank you for providing those additional screenshots.

Could you go to Settings> TroubleShooting Settings> Edit> From here, turn off temp dir cleanup and receptor release work

And then, in the control plane ee container (this is located in the task pod) get the /tmp/awx_<job_id>_<"*">

And also get the /tmp/receptor/<node name>/<work unit id>

You should be able to get the work unit id from the API for that job run.

Please provide us with the artifacts directory and the stdout file.

@aryklein
Copy link
Author

@djyasin I think I did it here right?
#14693 (comment)

@mxs-weixiong
Copy link

mxs-weixiong commented Dec 15, 2023

I got this error as well.

"Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b''

AWX 23.4.0
Execution Environment: 23.0.0

I no idea what is causing this. Logs doesn't tell me anything. It will occur at different job.

If anyone got idea on how to troubleshoot this , please advise me.

Thanks.

@kuba2915
Copy link

+1

My details:
Kubeadm cluster 1.28 on proxmox
AWX-Operator : 2.9.0 (helm, installed by argo with default values)
AWX definition with only spec for ingress.

Logs are the same in my case. It looks like that tasks are completed, but AWX fails when read status from execution environment.

Also when Verbosity is set to debug tasks are sucessful, but not always (~70% chances)

@aryklein
Copy link
Author

I recently migrated my AWX deployment to EKS, with Kubernetes version 1.25, and the issue has completely vanished

@mis4s
Copy link

mis4s commented Dec 18, 2023

Same issue on k3s v1.28

Edit: it seems to be resolved after updating inotify:

fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

@JSGUYOT
Copy link

JSGUYOT commented Dec 20, 2023

Same issue with :
AWX 23.5.1
AWX OPERATOR : 2.9.0
K3S : v1.27.2+k3s1

@rchaud
Copy link
Contributor

rchaud commented Dec 23, 2023

Im experiencing the same issue in 23.3.0 and K8s 1.24. I will dig deeper to troubleshoot more and capture some logs after the holidays.

@marek1712
Copy link

Same issue on k3s v1.28

Thanks!

I was on 1.27 and inotify increases didn't change a thing. Updating to 1.28 helped.

@mattiaoui
Copy link

hi, i have same error on awx 23.5.1.
I restore DB frrom old k3s server, project sync work without problem,but all template terminate with error:
Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b''

Template OUTPUT:
Worker output:
": "/tmp/awx_3386_b795lk3f", "JOB_ID": "3386", "INVENTORY_ID": "3", "PROJECT_REVISION": "e6d9f8f52adb8de7d052019a05326020c0d1cc4a", "ANSIBLE_RETRY_FILES_ENABLED": "False", "MAX_EVENT_RES": "700000", "AWX_HOST": "https://awx.xxx.xxx.loc", "ANSIBLE_SSH_CONTROL_PATH_DIR": "/runner/cp", "ANSIBLE_COLLECTIONS_PATHS": "/runner/requirements_collections:collections:/.ansible/collections:/usr/share/ansible/collections", "ANSIBLE_ROLES_PATH": "/runner/requirements_roles:/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles", "ANSIBLE_COLLECTIONS_PATH": "/runner/requirements_collections:~/.ansible/collections:/usr/share/ansible/collections", "ANSIBLE_CALLBACK_PLUGINS": "/usr/local/lib/python3.9/site-packages/ansible_runner/display_callback/callback", "ANSIBLE_STDOUT_CALLBACK": "awx_display", "AWX_ISOLATED_DATA_DIR": "/runner/artifacts/3386", "RUNNER_OMIT_EVENTS": "False", "RUNNER_ONLY_FAILED_EVENTS": "False"}, "cwd": "/runner/project"}
{"status": "running", "runner_ident": "3386"}.

i try with more different release off EE, but problem same.
Have Idea?
Bye
Mattia

@marek1712
Copy link

@mattiaoui Upgrade your K3s to 1.28. Then follow this tip:
#14693 (comment)

@mattiaoui
Copy link

@mattiaoui Upgrade your K3s to 1.28. Then follow this tip:

#14693 (comment)

Many Thanks,after upgrade all template work 🤟🤟.
Happy Holiday and New Years

@kuba2915
Copy link

In my case, temporary "fix" was setting Debug(3) in Verbosity on all my templates. After this change, all scheduled tasks was successful for about a week.

But during upgrade to k8s 1.29 I noticed that is issue with kube-proxy on node with awx, because it was still in CrashBootloop state.
I applied fix from here kubernetes-sigs/kind#2744 (comment) and it fixed my problem

But it is basically the same as @marek1712 mentioned, but with other values.

@rchaud
Copy link
Contributor

rchaud commented Jan 1, 2024

I think I have found the issue. It seems that all tasks controller pods are running 4 containers which are redis, task(awx image), rsyslog(awx image) and ee( awx-ee image).

I did notice that ee container in the pods were running awx-ee:latest. I changed that deployment to use the same awx-ee version as the awx deployment and it resolved the problem for me.

i think it is something with the latest image running as a controller.

I did not make any changes to my sysctl inodes or anything else.

I am running version 23.2.0 and everything is working perfectly after matching the ee container to my running awx version.

I suspect it might be a bug introduced in the ansible/awx-ee . I do see it was not updated since Jun and in Nov and Dec they had some updates.

@ngsin
Copy link

ngsin commented Jan 3, 2024

I think I have found the issue. It seems that all tasks controller pods are running 4 containers which are redis, task(awx image), rsyslog(awx image) and ee( awx-ee image).

I did notice that ee container in the pods were running awx-ee:latest. I changed that deployment to use the same awx-ee version as the awx deployment and it resolved the problem for me.

i think it is something with the latest image running as a controller.

I did not make any changes to my sysctl inodes or anything else.

I am running version 23.2.0 and everything is working perfectly after matching the ee container to my running awx version.

I suspect it might be a bug introduced in the ansible/awx-ee . I do see it was not updated since Jun and in Nov and Dec they had some updates.

@chinochao it seem like a bug in ansible runner (a python package awx would use it ), btw, which awx-ee version do you use?

@rchaud
Copy link
Contributor

rchaud commented Jan 3, 2024

I think I have found the issue. It seems that all tasks controller pods are running 4 containers which are redis, task(awx image), rsyslog(awx image) and ee( awx-ee image).
I did notice that ee container in the pods were running awx-ee:latest. I changed that deployment to use the same awx-ee version as the awx deployment and it resolved the problem for me.
i think it is something with the latest image running as a controller.
I did not make any changes to my sysctl inodes or anything else.
I am running version 23.2.0 and everything is working perfectly after matching the ee container to my running awx version.
I suspect it might be a bug introduced in the ansible/awx-ee . I do see it was not updated since Jun and in Nov and Dec they had some updates.

@chinochao it seem like a bug in ansible runner (a python package awx would use it ), btw, which awx-ee version do you use?

I am using 23.2.0 for awx and awx-ee in the deployment. For the EE in the AWX UI, I have latest configured. It seems the issue is using awx-ee as latest in the controller task containers.

@ngsin
Copy link

ngsin commented Jan 3, 2024

I think I have found the issue. It seems that all tasks controller pods are running 4 containers which are redis, task(awx image), rsyslog(awx image) and ee( awx-ee image).
I did notice that ee container in the pods were running awx-ee:latest. I changed that deployment to use the same awx-ee version as the awx deployment and it resolved the problem for me.
i think it is something with the latest image running as a controller.
I did not make any changes to my sysctl inodes or anything else.
I am running version 23.2.0 and everything is working perfectly after matching the ee container to my running awx version.
I suspect it might be a bug introduced in the ansible/awx-ee . I do see it was not updated since Jun and in Nov and Dec they had some updates.

@chinochao it seem like a bug in ansible runner (a python package awx would use it ), btw, which awx-ee version do you use?

I am using 23.2.0 for awx and awx-ee in the deployment. For the EE in the AWX UI, I have latest configured. It seems the issue is using awx-ee as latest in the controller task containers.

I set control_plane_ee_image and Execution Environment ee image with same version, and try both 23.2.0 and 23.3.1. The execution still fails with the json parse error

Error probabilities occur when job contains a lot of hosts

@rchaud
Copy link
Contributor

rchaud commented Jan 3, 2024

I think I have found the issue. It seems that all tasks controller pods are running 4 containers which are redis, task(awx image), rsyslog(awx image) and ee( awx-ee image).
I did notice that ee container in the pods were running awx-ee:latest. I changed that deployment to use the same awx-ee version as the awx deployment and it resolved the problem for me.
i think it is something with the latest image running as a controller.
I did not make any changes to my sysctl inodes or anything else.
I am running version 23.2.0 and everything is working perfectly after matching the ee container to my running awx version.
I suspect it might be a bug introduced in the ansible/awx-ee . I do see it was not updated since Jun and in Nov and Dec they had some updates.

@chinochao it seem like a bug in ansible runner (a python package awx would use it ), btw, which awx-ee version do you use?

I am using 23.2.0 for awx and awx-ee in the deployment. For the EE in the AWX UI, I have latest configured. It seems the issue is using awx-ee as latest in the controller task containers.

I set control_plane_ee_image and Execution Environment ee image with same version, and try both 23.2.0 and 23.3.1. The execution still fails with the json parse error

Can you provide the output from kubectl to see and make sure the awx-ee image is not using latest? Something like kubectl describe to one of the task containers.

@2and3makes23
Copy link

Ok PRs merged and the fixes should be in the latest awx-ee

Looks like this issue is recolved for us as well, thanks @TheRealHaoLiu ❤️

@ngsin
Copy link

ngsin commented Feb 16, 2024

anyone else have any other unique combination of

"job_explanation": "Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b''",

and

result_traceback

so far we addressed

Fixed a bug that cause enabling RECEPTOR_KUBE_SUPPORT_RECONNECT to be very "expensive" ansible/receptor#939

and in a lot of cases for job withlarge log size upping the containerLogMaxSize or better yet setting RECEPTOR_KUBE_SUPPORT_RECONNECT = enabled (better than just upping containerLogMaxSize since it add resiliency against other disconnection problem)

hi @TheRealHaoLiu , my AWX
Running with:

  • k8s 1.22
  • quay.io/ansible/awx:23.3.1
  • not set the receptor reconnect feature

i use a custom control panel ee image build on https://github.com/ansible/awx-ee
and still encounter

"result_traceback": "Receptor detail:\nFinished",

@David-Igou
Copy link

David-Igou commented Feb 17, 2024

I set image_pull_policy: Always in AWX cr and rebuilding my custom ee based on the latest build of awx-ee

it fixed the issue for roughly a day then it came back? strange

@ilbarone87
Copy link

ilbarone87 commented Feb 18, 2024

Setting image_pull_policy: Always didn't fix for me, even after rmi the awx-ee image and restart awx-task.
What fixed it was setting gather_facts: no

EDIT: Revert what i said above, is still happening but just at task level now

image

@mcapra
Copy link

mcapra commented Feb 20, 2024

I can replicate this "Failed to JSON parse a line from worker stream." issue very consistently when running job templates, with an awx-operator managed install of AWX, on EKS v1.22.17 (I know, it's old). I also have problems syncing inventory from a Git managed project -- some binascii / base64 padding errors.

Both of these problems go away when I change nothing other than deploying to a local Kind cluster running v1.29.1

@fosterseth
Copy link
Member

fosterseth commented Feb 20, 2024

@mcapra enabling RECEPTOR_KUBE_SUPPORT_RECONNECT only works on these kubernetes versions:

>= 1.23.14
>= 1.24.8
>= 1.25.4

make sure to NOT enable that if you are on 1.22.17

@2and3makes23
Copy link

One more note from our side:

The message Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b'' still shows up within AWX on Image Pull Backoff (in case of e.g. invalid image path)

Just in case you would like to get rid of all instances of this error message/you´d like to inform the user about this problem with a different message

@lbrigman124
Copy link

If you have the error and are using a custom AWX-EE image.
Does that image need to include the receptor package?

We easily reproduce this issue but only during facts gather when we have multiple endpoint to connect.
and AWX is not running on a single node.

@marek1712
Copy link

marek1712 commented Feb 27, 2024

I'm not sure if I'm doing something wrong, but I still get this error.
What I tried:

  • configure:
    fs.inotify.max_user_instances = 8192
    fs.inotify.max_user_watches = 524288
  • update AWX to 23.8.1
  • followed @kurokobo's suggestion to enable RECEPTOR_KUBE_SUPPORT_RECONNECT (I hope I did that in the correct place - by editing awx-on-k3s/base/awx.yaml):
    image
  • used k3s rmi to remove quay.io/ansible/awx-ee from the cache (my setting in the GUI is Pull: Always, though deployment says: image_pull_policy: IfNotPresent).
  • on top of that I'm already running Rancher (v1.28.6+k3s2) with the following:
    image

This:
kubectl -n awx exec -it automation-job-30469-fqbbq -- env | grep RECEPTOR_KUBE_SUPPORT_RECONNECT
doesn't return anything.

EDIT - just ran this:
kubectl -n awx exec -it deployment/awx-task -c awx-ee -- env | grep RECEPTOR_KUBE_SUPPORT_RECONNECT
and get:
RECEPTOR_KUBE_SUPPORT_RECONNECT=enabled

Any idea what to do next?

@jon-nfc
Copy link

jon-nfc commented Feb 27, 2024

@marek1712, it's possible that the deployment isn't updating. few items as food for thought:

  • don't use any container image with a tag of latest, always specify a version tag or specify the container sha256
  • inspect the image crictl inspecti <name> to ensure that it's updated to the desired one
  • check that the deployment did infact update kubectl describe deployment (check times)
  • check that the node that the automation is running on doesn't have errors kubectl describe no $HOSTNAME (one of my nodes had a wierd issue of reporting invalid drive size which was causing the node to become not ready and evicting pods)
  • check kubectl events for other issues

I'm also running k3s, although 1.26.13

@marek1712
Copy link

marek1712 commented Feb 27, 2024

Thank you!

* don't use any container image with a tag of `latest`, always specify a version tag or specify the container sha256

Just switched to 23.8.1 tag. Let's see how it goes.

* inspect the image `crictl inspecti <name>` to ensure that it's updated to the desired one

It's updated now.

* check that the deployment did infact update `kubectl describe deployment` (check times)

Event Age confirms that it did. I also see this:
image

* check that the node that the automation is running on doesn't have errors `kubectl describe no $HOSTNAME` (one of my nodes had a wierd issue of reporting `invalid drive size` which was causing the node to become not ready and evicting pods)

Looks OK:
Normal NodeAllocatableEnforced 19m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 19m kubelet Node $HOSTNAME status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 19m kubelet Node $HOSTNAME status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 19m kubelet Node $HOSTNAME status is now: NodeHasSufficientPID

* check `kubectl events` for other issues

I actually encountered OOM event (with 10 forks of Cisco.IOS module)... Just asked my colleagues to raise the amount of RAM (8GB as of now, nothing else runs on the VM).

EDIT: no resources have been added yet, but task is running for 4 hours now and has ~13k lines.

@Satyam777-git
Copy link

Satyam777-git commented Feb 27, 2024

I am having the same problem even after replacing "latest" tag with verison tag (23.8.1)

I am using rke2 cluster and have setup awx with helm chart.

kubernetes version - 1.29.2
rke2 version - 1.26
AWX Operator - 2.12.1
AWX Web - 23.8.1

@marek1712
Copy link

@Satyam777-git - did you enable RECEPTOR_KUBE_SUPPORT_RECONNECT? What's your container-log-max-size (the latter should be only a workaround)?

@TheRealHaoLiu
Copy link
Member

we gotta start pinning awx-ee image in the release...

@TheRealHaoLiu
Copy link
Member

ansible/awx-operator#1740
it wont make it into the next release... but next next release hopefully

@2and3makes23
Copy link

ansible/awx-operator#1740 it wont make it into the next release... but next next release hopefully

Just out of curiosity, are there plans to change the "latest" reference from awx-ee to receptor as well?

@oherma01
Copy link

Understanding that not everybody's environment is the same, is it fair to say that the baseline intended resolution for those experiencing this issue is to:

  1. Upgrade the awx image and the awx-operator image to >= 23.8.1 / 2.12.1 respectively
  • Run kubectl edit awx ...

  • Modify awx image_version field under the spec block

  • Run kubectl edit deployment ...

  • Modify awx-operator image field under the spec block

  1. Add the field: name: RECEPTOR_KUBE_SUPPORT_RECONNECT value: enabled under the ee_extra_env block in the AWX Custom Resource definition, adding it if it doesn't exist already.
  • Run kubectl edit awx ...
  1. Inspect the images present on the system, and confirm that new images have been downloaded for awx and awx-operator:
  • Run crictl images
  • Grep whatever pair of versions you installed
  1. Check that quay.io/ansible/awx-ee is showing as using the intended version
  • Run crictl inspecti quay.io/ansible/awx-ee
  1. Confirm that the awx deployment has been updated to use the new version, and shows that RECEPTOR_KUBE_SUPPORT_RECONNECT has been enabled
  • Run kubectl get deployment ...

And that if one were to start fresh with a new AWX environment, images would now be bound to the DEFAULT_AWX_VERSION as opposed to latest, and enabling RECEPTOR_KUBE_SUPPORT_RECONNECT would be still be required as part of the fix, (due to the reasons described here: #11805 (comment) and here: ansible/awx-operator#1484) but should only be enabled on a case by case basis?

We have a playbook that is responsible for moving many terabytes of data around from time to time, and while it doesn't fail after 4 hours as mentioned in #11805, it does generate a lot of output if we don't disable logging from rclone with the no-log flag, which caused us to run into this issue. We are running k3s v1.28.5, with AWX 23.5.1 and AWX operator 2.9.0.

Thank you!

@mxs-weixiong
Copy link

Added RECEPTOR to true but still getting error.

Do I need to specify all the latest image?
I running AWX 23.8.1 with operator 2.12.1

Anyone has steps on how to view the full error log?

Thanks.

@TheRealHaoLiu
Copy link
Member

@mxs-weixiong please provide the result_traceback from /api/v2/jobs/<job_id> of the fail job

@David-Igou
Copy link

Setting web and task replicas to 1 fixed this issue for me

@kzinas-adv
Copy link

It could be memory consumption issue: #15273

@akakshuki
Copy link

I got this error as well.

"Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b''

AWX 23.4.0 Execution Environment: 23.0.0

I no idea what is causing this. Logs doesn't tell me anything. It will occur at different job.

If anyone got idea on how to troubleshoot this , please advise me.

Thanks.

I have apply this solution but there have no changes. If this is bug any solution else or do we have older stable version ?

Hi do we have any s

@kevrrnet
Copy link

kevrrnet commented Sep 15, 2024

it could be memory consumption issue, after applied this config , the issues gone.

  task_resource_requirements:
    requests:
      cpu: 1000m
      memory: 1000Mi
    limits:
      cpu: 2000m
      memory: 4Gi
  web_resource_requirements:
    requests:
      cpu: 1000m
      memory: 1000Mi
    limits:
      cpu: 2000m
      memory: 4Gi
  ee_resource_requirements:
    requests:
      cpu: 2000m
      memory: 2000Mi
    limits:
      cpu: 8000m
      memory: 16Gi
  redis_resource_requirements:
    requests:
      cpu: 1000m
      memory: 1000Mi
    limits:
      cpu: 2000m
      memory: 4Gi
  rsyslog_resource_requirements:
    requests:
      cpu: 1000m
      memory: 2000Mi
    limits:
      cpu: 2000m
      memory: 4Gi
  init_container_resource_requirements:
    requests:
      cpu: 1000m
      memory: 2000Mi
    limits:
      cpu: 2000m
      memory: 4Gi

@HenriWahl
Copy link

@kevrrnet nice to hear it works for you. Do you know what the defaults are for the values you changed here?

@sskvulcan
Copy link

Same issue. Cluster version v1.24.12+rke2r1, AWX 23.3.1, the error occurs randomly, yesterday everything worked on custom Execution Environments

UPD: When I set env ansible/receptor#683 ee_extra_env: | - name: RECEPTOR_KUBE_SUPPORT_RECONNECT value: disable the error became complete and it: Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b'failed to create fsnotify watcher: too many open files after that I set the value to x3 (from installed by default) from the previous https:// serverfault.com/questions/1137211/failed-to-create-fsnotify-watcher-too-many-open-files and everything worked :)

Hope this helps you

The implementation in the serverfault indicated link was resolved this issue for me!

@kevrrnet
Copy link

kevrrnet commented Oct 16, 2024

update on my previous post:
seems it is not only the memory/cpu limitation issue, i still see a few errors in last few days. it is a intermittent error, and it is hard to reproduce it.
awx version that i use : 2.19.1
kubernetes version Server Version: v1.24.17

@kevrrnet
Copy link

kevrrnet commented Oct 25, 2024

any update on this issue? any suggestion ?
@HenriWahl

@fgardelli
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests