-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli/plugins: use same pgid + skip signal forwarding when attached to a TTY #4792
cli/plugins: use same pgid + skip signal forwarding when attached to a TTY #4792
Conversation
|
This reverts commit ef5e5fa. Running new plugins under a new pgid isn't a viable solution due to it causing issues with plugin processes attempting to read from the TTY (see: moby/moby#47073). Signed-off-by: Laura Brehm <[email protected]>
In order to solve the "double notification" issue (see: docker@ef5e5fa) without running the plugin process under a new pgid (see: moby/moby#47073) we instead check if we're attached to a TTY, and if so skip signalling the plugin process since it will already be signalled. Signed-off-by: Laura Brehm <[email protected]>
3973885
to
5f6c55a
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #4792 +/- ##
===========================================
+ Coverage 0 59.63% +59.63%
===========================================
Files 0 287 +287
Lines 0 24777 +24777
===========================================
+ Hits 0 14775 +14775
- Misses 0 9114 +9114
- Partials 0 888 +888 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Tested with:
$ ./build/docker-linux-amd64 version
Client:
Version: 24.0.0-rc.2-628-g5f6c55a724.m
API version: 1.44
Go version: go1.21.6
Git commit: 5f6c55a724
Built: Mon Jan 15 13:41:28 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Desktop
Engine:
Version: 25.0.0-rc.1
API version: 1.44 (minimum version 1.24)
Go version: go1.21.5
Git commit: 9cebefa
Built: Thu Jan 4 16:30:11 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.26
GitCommit: 3dd1e886e55dd695541fdcd67420c2888645a495
runc:
Version: 1.1.10
GitCommit: v1.1.10-0-g18a0cb0
docker-init:
Version: 0.19.0
GitCommit: de40ad0
BUILDX_EXPERIMENTAL=1 ./build/docker-linux-amd64 build .
#0 building with "default" instance using docker driver
#1 [internal] connecting to local controller
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 4.90kB 0.0s done
#2 DONE 0.0s
#3 resolve image config for docker.io/docker/dockerfile:1
#3 DONE 0.5s
#4 docker-image://docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021
#4 CACHED
...
I was testing with Compose and noticed that we'd lost the desired "notify plugin to exit via socket if detached" behavior and mostly sure I nailed it down to cli/cli-plugins/socket/socket.go Lines 36 to 46 in dbf992f
specifically, the With the func accept(listener *net.UnixListener, conn **net.UnixConn) {
defer listener.Close()
go func() {
for {
// ignore error here, if we failed to accept a connection,
// conn is nil and we fallback to previous behavior
newConn, err := listener.AcceptUnix()
*conn = newConn
if err != nil {
println(err.Error())
}
// perform any platform-specific actions on accept (e.g. unlink non-abstract sockets)
onAccept(*conn, listener)
}
}()
}
Looks like during the refactor we moved that |
cli-plugins/socket/socket.go
Outdated
} | ||
|
||
accept(listener, conn) | ||
|
||
return EnvKey + "=" + listener.Addr().String(), nil | ||
closeListener := func() { | ||
listener.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps take the opportunity to make linters and IDEs "happy";
listener.Close() | |
_ = listener.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! My linters don't complain about this, I wonder if it's Goland (maybe we can add something to .golangci.yml
so that CI/people not using Goland would pick it up too). Btw, does it also complain about this in a defer
?
Also tested with Compose v2.24.0 – both "attached" to a TTY and not-attached seem to be working correctly ✅ |
dcab77e
to
bec75eb
Compare
amended with comments, can you TAL @thaJeztah? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the defer
😅
I'm not entirely convinced this is the right approach: now, each plugin will have to implement the 'graceful termination' behavior; which is a thing we can do/build into the cli-plugins
package, but still, I'm not sure having two copies of that logic that run conditionally really makes sense.
It feels doubly dubious when the logic built into the parent CLI will almost never trigger in a way that is useful (given that it won't trigger when a TTY is present).
I do feel like the pgid change is a beneficial one, unless there is a definite reason that plugins must be able to read from the controlling TTY (they will still be able to read from a non-TTY stdin like a file).
That's a valid thought (and one I had while discussing this). The impetus to make this change, as is, now, was:
The other way we could explore this would be to build functionality into the CLI plugins API so that plugins can use that instead of reading from TTY – however, that's not trivial, should be well-explored, and not something we rush out to get a fix in. Since that's not an option for right now, and reverting the pgid changes without another patch (such as this one) leads to an even worse situation for right now, I think the current approach (or something similar) is the only solution for right now – even if we then take some time and figure out that functionality later so we can ultimatively move over for that. Moreover, re: implementing an API for this in the CLI, we do need to think that all of these new features will be gated behind an "if running through the CLI" (for Buildx at least) and the plugin will still have to read directly from the TTY when running standalone, so those new features will have to be thoroughly thought out/not something we can do out of the gate. |
Signed-off-by: Laura Brehm <[email protected]>
bec75eb
to
508346e
Compare
@neersighted we should decide on this to unblock v25 (@thaJeztah's probably waiting on it to make another RC 😅) – we either leave it as-is and own that we'll break buildx, or merge this change now and work on getting the pgid changes in later (together with an API for plugins to read from TTY). |
To me I think it depends on how critical reading from the TTY is to the buildx folks; I'm okay with a fast revert if we think it's necessary to buildx (for now). However, if this is not a critical behavior and buildx could be changed, I'd suggest that it's better to adopt the pgid change now to prevent future issues of this nature (and I'd suggest that the regression with older buildx is fine given it's under |
I think we all agree that this is not ideal, but the options we currently have limit our choices;
Our of options above, I'm leaning towards 3 (LOL, and after writing... my list of options is ... well.. basically fully the same as the list @laurazard posted in #4792 (comment) above) @docker/runtime-team any thoughts? |
LGTM. Considering the options available, I'd also lean towards n.3, using the conditional code from this PR. Shall we also open an issue to track this "suboptimal" solution so we don't forget about looking into it for too long? Or do we track these things somewhere else? |
I'm really not sure how nerdctl factors in, as it has a completely separate implementation that wraps In any case, it seems like option 3 is the consensus (though I'd still really like to hear from the buildx folks WRT why the I/O is needed; cc @crazy-max); while with the compromise here my first thought is we can remove the 3-signal-to-exit feature from the host CLI (as it's no longer doing much good), if we plan to resolve the signaling/IO issues in the near future such that the functionality can primarily live in the host CLI, it likely makes sense to take this as is. |
@neersighted There is some pipe handling when detach mode is enabled to handle debugging but in the case of this issue it will use the local one: "connecting to local controller". This might be related to https://github.com/docker/buildx/blob/78adfc80a90ede124ae30457dc0f3c2f688f5121/controller/processes/processes.go#L101 I will take a look. @ktock @jedevc if you have an idea in any case 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an okay band-aid to me
Let me post a quick update on discussions we had;
I'll merge this PR to unblock the v25.0.0 release, but we'll look at follow-ups 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The changes in docker@dcbf005 fixed the "cancellable context" detection, and made it so that Compose would conditionally set up signal handling when the context was already not cancellable/when the plugin was running through the CLI, as we'd introduced a mechanism into the CLI to signal plugins to exit through a socket instead of handling signals themselves. This had some (not noticed at the time) issues when running through the CLI as, due to sharing a process group id with the parent CLI process, when a user CTRL-Cs the CLI will notify the plugin via the socket but the plugin process itself will also be signalled if attached to the TTY. This impacted some Compose commands that don't set up signal handling - so not `compose up`, but other commands would immediately quit instead of getting some "graceful" cancelled output. We initially attempted to address this "double notification" issue in the CLI by executing plugins under a new pgid so that they wouldn't be signalled, but that posed an issue with Buildx reading from the TTY, (see: moby/moby#47073) so we reverted the process group id changes and ended at a temporary solution in docker/cli#4792 where the CLI will only notify plugins via the socket when they are not already going to be signalled (when attached to a TTY). Due to this, plugins should always set up some signal handling, which this commit implements. Signed-off-by: Laura Brehm <[email protected]>
The changes in docker@dcbf005 fixed the "cancellable context" detection, and made it so that Compose would conditionally set up signal handling when the context was already not cancellable/when the plugin was running through the CLI, as we'd introduced a mechanism into the CLI to signal plugins to exit through a socket instead of handling signals themselves. This had some (not noticed at the time) issues when running through the CLI as, due to sharing a process group id with the parent CLI process, when a user CTRL-Cs the CLI will notify the plugin via the socket but the plugin process itself will also be signalled if attached to the TTY. This impacted some Compose commands that don't set up signal handling - so not `compose up`, but other commands would immediately quit instead of getting some "graceful" cancelled output. We initially attempted to address this "double notification" issue in the CLI by executing plugins under a new pgid so that they wouldn't be signalled, but that posed an issue with Buildx reading from the TTY, (see: moby/moby#47073) so we reverted the process group id changes and ended at a temporary solution in docker/cli#4792 where the CLI will only notify plugins via the socket when they are not already going to be signalled (when attached to a TTY). Due to this, plugins should always set up some signal handling, which this commit implements. Signed-off-by: Laura Brehm <[email protected]>
@thaJeztah What is this issue and how is that a regression? I assume if compose "terminates immediately", then it can only be if it is lacking support for handling signals. |
@@ -240,16 +241,17 @@ func tryPluginRun(dockerCli command.Cli, cmd *cobra.Command, subcommand string, | |||
// we send a SIGKILL to the plugin process and exit | |||
go func() { | |||
retries := 0 | |||
for s := range signals { | |||
for range signals { | |||
if dockerCli.Out().IsTerminal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if dockerCli.Out()
is ideal here, eg. for a command like build -o type=tar . > t.tar
. Although afaics, the actual behavior change from this line is minimal.
No, that wasn't the issue. The issue is well-described in the linked commit in the PR description – ef5e5fa For convenience:
Since Compose up works like: 1st CTRL-C - start "graceful" teardown / 2nd CTRL-C - forceful teardown/exit, the double notification (via the plugin socket, and the process itself getting signalled) resulted in every time the user hits CTRL-C, Compose receives two "cancel signals" and skips the graceful teardown directly to the forceful exit. That same "double notification" issue was what the process group id changes tried to address (fix the double notification by running the plugin process under a new process group id, so it doesn't also get signalled) and this PR solves it with a different approach (by not notifying via the socket when the plugin is attached to a TTY and hence already going to receive a signal). |
By "double notification" issue, do you mean this line https://github.com/docker/cli/pull/4792/files#diff-3dcd6325b70a5f5db393a3ecc08060b69007fcef371fb4dabf3164da95646fb6L252 that has now been reverted by this PR and does not exist in v24. There is no "double notification" issue in v24. I think what makes sense is to leave everything as was in v24 (plugins are in same processgroup, cli does not duplicate its signals). Additionally, this unix socket can be used to send custom termination request. Handling that request is optional and it doesn't cause any conflict for the plugin as when a plugin decides to support the socket, it can make sure that if it does its own signal counting logic, the new socket mechanism does not cause it to go out of sync. I believe this is mostly the current state after this PR, ignoring some |
Ah, no. The previous comments were discussing whether to "simply revert the pgid patch" (which would have reintroduced that issue with Compose) or to go with this change, which reverts that but also adds the |
The changes in docker@dcbf005 fixed the "cancellable context" detection, and made it so that Compose would conditionally set up signal handling when the context was already not cancellable/when the plugin was running through the CLI, as we'd introduced a mechanism into the CLI to signal plugins to exit through a socket instead of handling signals themselves. This had some (not noticed at the time) issues when running through the CLI as, due to sharing a process group id with the parent CLI process, when a user CTRL-Cs the CLI will notify the plugin via the socket but the plugin process itself will also be signalled if attached to the TTY. This impacted some Compose commands that don't set up signal handling - so not `compose up`, but other commands would immediately quit instead of getting some "graceful" cancelled output. We initially attempted to address this "double notification" issue in the CLI by executing plugins under a new pgid so that they wouldn't be signalled, but that posed an issue with Buildx reading from the TTY, (see: moby/moby#47073) so we reverted the process group id changes and ended at a temporary solution in docker/cli#4792 where the CLI will only notify plugins via the socket when they are not already going to be signalled (when attached to a TTY). Due to this, plugins should always set up some signal handling, which this commit implements. Signed-off-by: Laura Brehm <[email protected]>
BUILDX_EXPERIMENTAL=1
hangs moby/moby#47073- What I did
Reverted ef5e5fa and solved the "double notification" issue by only using the socket to notify the plugin process when not attached to a tty/not already going to receive a signal.
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)