Jobs are killed after 4 hours #14870

benapetr · 2024-02-13T10:09:17Z

benapetr · 2024-02-13T10:11:06Z

The jobs are in error status with this information:

Failed to JSON parse a line from worker stream. Error: Expecting value: line 1 column 1 (char 0) Line with invalid JSON data: b''

TheRealHaoLiu · 2024-02-14T15:39:33Z

this is due to kube apiserver connection time limit and can be fixed by setting

ee_extra_env: |
  - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
    value: enabled

please refer to ansible/receptor#683 for further detail

benapetr · 2024-02-15T11:04:32Z

ok, but why did it start happening only recently? older versions of AWX didn't have this problem? I will try to add it to kustomize manifests that install AWX, but I am surprised why is linked receptor issue merged and marked resolved, yet it still affects AWX?

fosterseth · 2024-02-15T19:20:35Z

@benapetr the feature landed but users still need to manually enable the flag on the awx spec file to apply the fix. Eventually we will be able to default with this flag enabled, once all users/customers are on the prerequisite k8s version

ok, but why did it start happening only recently?

is it possible that before your jobs did not run for 4 hours?

Commifreak · 2024-02-20T05:48:32Z

I also observed this "new" behavior after latest one or two updates of AWX/Operator. And the job did also run more than 4 hrs before:

(that was with):

ansible-playbook [core 2.15.9]
  python version = 3.9.18 (main, Jan  4 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] (/usr/bin/python3)
  jinja version = 3.1.3
  libyaml = True

and kube 1.28.3

But dont ask me which awx version that was. Some recent.

Strange. I will also try the ee_extra_env.

benapetr · 2024-02-20T10:00:10Z

Yes, exactly these jobs ran always for over 10 hours, no problems, unfortunatelly we only run them like once a month or two. Now suddenly they started having problems. We did OS update and AWX updates meanwhile, so I can't track down /when/ it started happening, but I know for sure it worked in the past and now it doesn't by default.

The fix mentioned by @TheRealHaoLiu definitely fixes it though.

fosterseth · 2024-02-21T18:59:48Z

@Commifreak enabling RECEPTOR_KUBE_SUPPORT_RECONNECT is certainly recommended, let us know if this helps your long running jobs

Commifreak · 2024-02-22T05:11:38Z

Guess what? Without setting the env var, its working again!

What changed in the meantime? => I updated (regular update) to AWX 23.8.1. I dont know if that helped. But I guess its not bad to set this env var anyway.

TheRealHaoLiu · 2024-03-12T19:01:00Z

confused Hao is confused.... we recently flip the default behavior for reconnect to true since we bumped the required kube version

also there were couple bugs we fixed that was caused by some receptor refactoring

closing this issue...

github-actions bot added component:api needs_triage type:bug community labels Feb 13, 2024

TheRealHaoLiu closed this as completed Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs are killed after 4 hours #14870

Jobs are killed after 4 hours #14870

benapetr commented Feb 13, 2024

benapetr commented Feb 13, 2024

TheRealHaoLiu commented Feb 14, 2024

benapetr commented Feb 15, 2024

fosterseth commented Feb 15, 2024

Commifreak commented Feb 20, 2024 •

edited

Loading

benapetr commented Feb 20, 2024

fosterseth commented Feb 21, 2024

Commifreak commented Feb 22, 2024

TheRealHaoLiu commented Mar 12, 2024

Jobs are killed after 4 hours #14870

Jobs are killed after 4 hours #14870

Comments

benapetr commented Feb 13, 2024

Please confirm the following

Bug Summary

AWX version

Select the relevant components

Installation method

Modifications

Ansible version

Operating system

Web browser

Steps to reproduce

Expected results

Actual results

Additional information

benapetr commented Feb 13, 2024

TheRealHaoLiu commented Feb 14, 2024

benapetr commented Feb 15, 2024

fosterseth commented Feb 15, 2024

Commifreak commented Feb 20, 2024 • edited Loading

benapetr commented Feb 20, 2024

fosterseth commented Feb 21, 2024

Commifreak commented Feb 22, 2024

TheRealHaoLiu commented Mar 12, 2024

Commifreak commented Feb 20, 2024 •

edited

Loading