Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes worker pods restart forever #2132

Closed
benclifford opened this issue Oct 1, 2021 · 1 comment
Closed

kubernetes worker pods restart forever #2132

benclifford opened this issue Oct 1, 2021 · 1 comment
Labels

Comments

@benclifford
Copy link
Collaborator

Describe the bug
Kubernetes worker pods, perhaps only ones which did not register properly, accumulate forever.
The worker process exits "normally" without an indication in kubectl logs that there is some error, but kubernetes immediately restarts that worker (which then fails again).
Nothing causes these workers to go away.

For example, here is one that has been restarted a 1607 times, since it was initially launched over two weeks ago.

root@amber:~# minikube kubectl get pods
NAME                                            READY   STATUS             RESTARTS           AGE
funcx-1632329996841                             0/1     CrashLoopBackOff   1607 (4m31s ago)   8d
[...]
Collecting funcx-endpoint>=0.2.0
  Downloading funcx_endpoint-0.3.3-py3-none-any.whl (91 kB)

...

Installing collected packages: pycparser, cffi, zipp, urllib3, typing-extensions, six, pyjwt, idna, cryptography, charset-normalizer, certifi, requests, pynacl, importlib-metadata, bcrypt, typeguard, tblib, pyzmq, pyrsistent, pyparsing, psutil, paramiko, lockfile, globus-sdk, docutils, dill, click, attrs, websockets, typer, texttable, python-daemon, py, parsl, packaging, jsonschema, fair-research-login, decorator, configobj, retry, funcx, funcx-endpoint
Successfully installed attrs-21.2.0 bcrypt-3.2.0 certifi-2021.5.30 cffi-1.14.6 charset-normalizer-2.0.6 click-8.0.1 configobj-5.0.6 cryptography-35.0.0 decorator-5.1.0 dill-0.3.4 docutils-0.17.1 fair-research-login-0.2.3 funcx-0.3.3 funcx-endpoint-0.3.3 globus-sdk-2.0.1 idna-3.2 importlib-metadata-4.8.1 jsonschema-4.0.1 lockfile-0.12.2 packaging-21.0 paramiko-2.7.2 parsl-1.1.0 psutil-5.8.0 py-1.10.0 pycparser-2.20 pyjwt-1.7.1 pynacl-1.4.0 pyparsing-2.4.7 pyrsistent-0.18.0 python-daemon-2.3.0 pyzmq-22.3.0 requests-2.26.0 retry-0.9.2 six-1.16.0 tblib-1.7.0 texttable-1.6.4 typeguard-2.12.1 typer-0.4.0 typing-extensions-3.10.0.2 urllib3-1.26.7 websockets-9.1 zipp-3.6.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
PROCESS_WORKER_POOL main event loop exiting normally

To Reproduce
Start a worker with the python versions incorrectly configured.

Expected behavior
Broken worker pods should not accumulate without bound.

Environment
my minikube environment on ubuntu

@benclifford benclifford added the bug label Oct 1, 2021
@benclifford
Copy link
Collaborator Author

wups wrong repo. This is potentiall a bug in the parsl kubernetes stuff too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant