Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop closed plugins that will be removed #89

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

Iceber
Copy link
Member

@Iceber Iceber commented Jun 7, 2024

Most of Adaptation's methods call removeClosedPlugins to remove closed plugins, which are moved out of the r.plugins list and never operated on again.

It may remain a zombie process on the system, and using some tools it will look like the plugin is running normally.

# pstree
...
        ├─containerd─┬─03-logger
        │            └─15*[{containerd}]

This pr causes removeClosedPlugins to call plugin.stop when removing closed plugins, which avoids these zombie processes.

if len(closed) != 0 {
go func() {
for _, plugin := range closed {
plugin.stop()
Copy link
Member

@klihub klihub Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes slightly the semantics of setting an on-close handler on the plugin side with stub.WithOnClose(), although I am not sure if for pre-launched plugins the current behavior should be considered a feature or a bug.

Anyway, the current semantics both for external and pre-launched plugins is that if the connection is closed, the stub internally os.Exit(0)s if the plugin has not set an OnClose handler of its own. But if the plugin has set a handler, the stub calls it and does not exit, (perhaps naively) trusting the plugin that it will do the right thing, IOW do any extra stuff it wants/needs and then always exit if it was a pre-launched plugin.

With this change in place in it's current form, we don't do a single thing to give the stub a chance to call any potential OnClose handler of the plugin when it is a pre-launched one. If we had to chose the lesser evil, IOW accumulating zombies vs. butchering the plugin, I think this is the right thing to do.

But I was thinking that maybe we could give the plugin some slack here before we stop/kill it, since we're running that asynchronously anyway. That still would not guarantee that plugin's handler would be run, but it would give it a better chance. Unfortunately I think if we want to do any better than that, then we'd need to make this thing explicitly controllable through the API (define a default timeout for pre-launched plugins to let their OnClose handler run and make it configurable with a new stub.Option)... but it would also require extra communication (probably in one of the plugin registration or configuration messages) since the option is set on the stub side but should take effect in the adaptation on the runtime side. So I would not go there now, unless we know for a fact that there are plugins used as pre-launched ones which heavily rely on the current (implicit) semantics...

Now I assume that you came upon this observed zombie behavior, because you have a plugin which sets an OnClose handler but then it does not exit, right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest test case(s) for the example plugins validating a plugin's on close func ...

agree with your point, @klihub, that this should be configurable for each plugin... Maybe add a ctx with timeout created based on config (or a default for now say two seconds grace), you'd create that timeout ctx in the WithOnClose func and pass the ctx into their onClose func that they requested..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe WithOnCloseExit and deprecate the older one add a notice about the issue...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I overlooked WithOnClose, we do need to tolerate the behavior of the plugin when it's closed.

This pr may need to be updated, but even with WithOnClose, the zombie process still exists because the adaptation doesn't perform a reclaim operation(process.Wait) after the plugin exits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I overlooked WithOnClose, we do need to tolerate the behavior of the plugin when it's closed.

This pr may need to be updated, but even with WithOnClose, the zombie process still exists because the adaptation doesn't perform a reclaim operation(process.Wait) after the plugin exits.

True, we can take care of giving the plugin some slack in another PR.

@klihub klihub self-requested a review July 8, 2024 08:07
Copy link
Member

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's take care of killing-after-some-slack in another PR.

@klihub klihub requested review from mikebrow and fuweid July 8, 2024 08:59
@fuweid fuweid merged commit 5a4c86a into containerd:main Jul 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants