Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zombie processes from dvc pull failures #3744

Closed
ychou85 opened this issue May 5, 2020 · 7 comments
Closed

zombie processes from dvc pull failures #3744

ychou85 opened this issue May 5, 2020 · 7 comments
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@ychou85
Copy link

ychou85 commented May 5, 2020

Please provide information about your setup
DVC version(i.e. dvc --version), Platform and method of installation (pip, homebrew, pkg Mac, exe (Windows), DEB(Linux), RPM(Linux))

DVC version 0.93.0
Platform: DGX2 kubernetes cluster, installation with pip,

The base image we use.

FROM nvcr.io/nvidia/pytorch:19.10-py3

When we call dvc pull, if the pull fails for any reason (lack of right credentials for one, or if we ctr-C), we see a lot of zombie processes start spawning on our pod. This occurs to the point that it ties up all free resources on the cluster and grinds things to a halt. Can someone look into this please? It's affecting a pilot team promoting DVC usage at a major healthcare corporation.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label May 5, 2020
@efiop
Copy link
Contributor

efiop commented May 5, 2020

Hi @ychou85 !

Would be interesting to see more details or a reproduction script.

But as is I would suspect the analytics worker, which is spawned by dvc on each CLI command. You can totally see it in the process list by dvc daemon analytics in it. Could you check that, please?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label May 5, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 5, 2020
@ychou85
Copy link
Author

ychou85 commented May 7, 2020

Here is the top output from my user
B4A0A77B

@efiop
Copy link
Contributor

efiop commented May 7, 2020

@ychou85 Not sure what top version you are using, but that output is not very helpful, as COMMAND is missing arguments. If you are not able to make it show it, then you could manually show cat /proc/$PID/cmdline for each dvc PID that you see in that table above.

@efiop
Copy link
Contributor

efiop commented May 7, 2020

@ychou85 Also, I'm not sure how you are launching those dvc processes in the first place. Are you doing that directly from the dockerfile? Or from your own wrapper? If the latter, please double check you are properly collecting (akak wait()ing) on the processes you spawn.

@shcheklein
Copy link
Member

@ychou85 could you please try to opt-out from the analytics as described here - https://dvc.org/doc/user-guide/analytics ?

@efiop can it be updater daemon also?

@shcheklein
Copy link
Member

@ychou85 could please run ps -aef | grep dvc or something similar to get full dvc commands with arguments, to actually see what is running now.

@efiop
Copy link
Contributor

efiop commented Jun 3, 2020

Closing as stale

@efiop efiop closed this as completed Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

3 participants