Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

PAI doesn't clean job container #2354

Closed
mzmssg opened this issue Mar 19, 2019 · 7 comments
Closed

PAI doesn't clean job container #2354

mzmssg opened this issue Mar 19, 2019 · 7 comments
Labels

Comments

@mzmssg
Copy link
Member

mzmssg commented Mar 19, 2019

Organization Name:

Micorsoft

Short summary about the issue/question:

PAI doesn't clean job contaienrs.

OpenPAI Environment:

  • OpenPAI version: master

Anything else we need to know:
Root cause:
In #1108, we remove --rm as launching job container, so docker daemon won't clean containers. Confirmed with @wangdian that cleaner won't clean these containers. Then there is no container clean mechanism now.
Residual container examples:

42af191d4a5c        openpai/pai.example.tensorflow                        "/bin/bash /pai/boot…"   About an hour ago   Exited (0) About an hour ago                                                    admin-test-sleep-container_e44_1552954684228_0103_01_000002
4fe903811bca        openpai/pai.example.tensorflow                        "/bin/bash /pai/boot…"   About an hour ago   Exited (143) About an hour ago                                                  admin-test-kill-job3-container_e44_1552954684228_0100_01_000002

@mzmssg
Copy link
Member Author

mzmssg commented Mar 19, 2019

It's a blocking issue

@mzmssg mzmssg added the bug label Mar 19, 2019
@fanyangCS
Copy link
Contributor

Why removing -rm?

@mzmssg
Copy link
Member Author

mzmssg commented Mar 19, 2019

@fanyangCS
This pr need docker inspect to get container OOM status.

@fanyangCS
Copy link
Contributor

Ok. After the inspection, we could rm it. Right? Please also check whether the debug mode is affected.

@mzmssg
Copy link
Member Author

mzmssg commented Mar 19, 2019

@fanyangCS
Nope, all commands in yarn container might be interrupted, so any clean action in the yarn container is unreliable.
We need dockerd(with --rm) or cleaner(implemented by ourselves) to manage the residual containers out of yarn container.

@fanyangCS
Copy link
Contributor

fanyangCS commented Mar 20, 2019

#2355
#1793
#1658

@mzmssg
Copy link
Member Author

mzmssg commented Mar 20, 2019

Fixed in #2355

@mzmssg mzmssg closed this as completed Mar 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants