Skip to content

Commit

Permalink
reattach: don't kill process on failed reconnection
Browse files Browse the repository at this point in the history
During reattachment, we look to see if the process corresponding to the stored
PID is running. If so, we try to connect to that process. If that fails, we kill
the process under the presumption it's not working, and return
ErrProcessNotFound.

But during reattachment we don't know that the PID we have is still valid. Which
means that the process we're trying to attach to may have exited and a different
process has spawned with the same PID. This results in some unrelated process
getting silently killed.

This impacts Nomad when running the `rawexec` or `exec` task drivers, because
the Nomad agent spawns an "executor" process via go-plugin to control the
workloads, and these executors are left running when Nomad exits. If the
executors die in the meantime (or the host is rebooted), then we can potentially
kill a random process on the host.

Because there's no way for go-plugin to know whether the process is a go-plugin
server without connecting, this kill is never really safe. Remove it.

Ref: hashicorp/nomad#23969
Ref: https://hashicorp.atlassian.net/browse/NET-11233
  • Loading branch information
tgross committed Oct 21, 2024
1 parent 7e5ceaa commit c1eccbd
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion internal/cmdrunner/cmd_reattach.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ func ReattachFunc(pid int, addr net.Addr) runner.ReattachFunc {
// doesn't actually return an error if it can't find the process.
conn, err := net.Dial(addr.Network(), addr.String())
if err != nil {
p.Kill()
return nil, ErrProcessNotFound
}
conn.Close()
Expand Down

0 comments on commit c1eccbd

Please sign in to comment.