-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Close open connections when changing primary #2526
Conversation
Please sign your commits following these rules: $ git clone -b "close_replica_conns" [email protected]:wsong/swarm.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f Amending updates the existing PR. You DO NOT need to open a new one. |
b0609bc
to
ea0d8ee
Compare
Ping @dongluochen @nishanttotla I'm not sure if this is the right way to accomplish this; at first glance, it seems like there could be a danger of connections being closed while there's still stuff being written to them. The reason why I sent this PR out in the first place is that I noticed that |
Signed-off-by: Wayne Song <[email protected]>
ea0d8ee
to
069b7da
Compare
Tests appear to be failing with error messages like these showing up in the logs:
I wonder if I'm accidentally killing connections to the KV store somehow? |
The problem is not clear here. We may need to clarify problem first.
|
My understanding was that the https://github.com/docker/swarm/blob/master/api/replica.go#L10 |
You are right that
|
We turned on debug logging and it was clear that the |
That doesn't sound right. Can you show me the commands you used to start the (primary and) replicas? Here is what I see. Docker
Docker
|
This was within UCP, and it's difficult to replicate on the command line because (assuming our theory is correct) the issue only occurs if you're talking to a replica and you make a request that gets forwarded to the primary; you then save the TCP connection and then later attempt to reuse it with an |
I managed to repro this with a small script: package main
import (
"fmt"
"github.com/docker/engine-api/client"
"golang.org/x/net/context"
)
func main() {
c, _ := client.NewEnvClient()
c.ContainerInspect(context.TODO(), "asdf")
info, _ := c.Info(context.TODO())
fmt.Printf("%s\n", info)
} To test this, I set up a two-node cluster with two managers: node A and node B. node B was a replica, and I set DOCKER_HOST to be node B's Swarm manager address. |
The problem may need more clear trace, or analysis on network trace. If keepalive is the problem, I think disconnecting after hijack is the solution. But I'd try not to add another channel. |
Signed-off-by: Wayne Song [email protected]