Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DialPipe problem with multiple calls / waiting for busy pipe #67

Closed
StefanScherer opened this issue Sep 17, 2017 · 8 comments
Closed

DialPipe problem with multiple calls / waiting for busy pipe #67

StefanScherer opened this issue Sep 17, 2017 · 8 comments

Comments

@StefanScherer
Copy link

Currently there is some work in progress (portainer/portainer#1186) to add mapped named pipe support into Portainer so it will be easier to run it in a Windows container ( docker run -u ContainerAdministrator -it -v //./pipe/docker_engine://./pipe/docker_engine -p 9000:9000 stefanscherer/portainer:insider -H npipe:////./pipe/docker_engine )

We're using winio.DialPipe() to connect to the Docker engine from inside a microsoft/nanoserver-insider container.

At the moment we see problems in Portainer as it sometimes makes multiple requests requested by the web frontend. Here's a WireShark screenshot with the problem.

bildschirmfoto 2017-09-17 um 07 42 23

The two /api/endpoints/1/docker/xx GET requests use separate connections and run in parallel. One of the calls aborts with "open //./pipe/docker_engine: The system cannot find the file specified."

I have seen the check for cERROR_PIPE_BUSY and first thought that createFile() returns cERROR_FILE_NOT_FOUND and aborts immediatelly.

We are using the default timeout right now

    return winio.DialPipe(namedPipePath, nil)

but I also tried giving a 5 second timeout

    timeout := 5000 * time.Millisecond
    return winio.DialPipe(namedPipePath, &timeout)

Digging deeper it seems like the waitNamedPipe() just does not wait long enough or cannot start wait for the pipe.

Adding some Println in DialPipe I now see this running Portainer in the second DialPipe call:

DialPipe createFile err All pipe instances are busy.
DialPipe waitNamedPipe ms 4998
DialPipe waitNamedPipe returns err The system cannot find the file specified.
DialPipe returns err The system cannot find the file specified.

The problem is that waitNamedPipe returns immediatelly and does not wait the given 5000 milliseconds.

How can we make DialPipe more reliable?

@StefanScherer
Copy link
Author

It seems the problem only appears inside a container.

I stripped a test down down to this (with some debug Println in go-winio):

package main

import (
	"fmt"
	"time"

	"github.com/StefanScherer/go-winio"
)

func main() {
	namedPipePath := "\\\\.\\pipe\\docker_engine"
	timeout := 5000 * time.Millisecond
	conn1, err1 := winio.DialPipe(namedPipePath, &timeout)
	fmt.Println("err1", err1)
	conn2, err2 := winio.DialPipe(namedPipePath, &timeout)
	fmt.Println("err2", err2)
	if conn1 != nil {
		conn1.Close()
	}
	if conn2 != nil {
		conn2.Close()
	}
}

And running the container I very often see the problem opening the pipe the second time:

$ docker run -u ContainerAdministrator -v //./pipe/docker_engine://./pipe/docker_engine stefanscherer/testpipe:insider
DialPipe createFile err <nil>
err1 <nil>
DialPipe createFile err All pipe instances are busy.
DialPipe waitNamedPipe ms 5000
DialPipe waitNamedPipe returns err The system cannot find the file specified.
DialPipe returns err The system cannot find the file specified.
err2 open \\.\pipe\docker_engine: The system cannot find the file specified.

Only a few times I got

$ docker run -u ContainerAdministrator -v //./pipe/docker_engine://./pipe/docker_engine stefanscherer/testpipe:insider
DialPipe createFile err <nil>
err1 <nil>
DialPipe createFile err <nil>
err2 <nil>

Running the testpipe.exe on the Windows Insider Docker host seems to work every time.

StefanScherer added a commit to StefanScherer/dockerfiles-windows that referenced this issue Sep 17, 2017
StefanScherer added a commit to StefanScherer/dockerfiles-windows that referenced this issue Sep 17, 2017
@friism
Copy link

friism commented Sep 17, 2017

cc @jstarks

@deviantony
Copy link

Any update on this?

@StefanScherer @friism do we know when the named pipe mount feature will be released?

@StefanScherer
Copy link
Author

@deviantony I will try to get more info at DockerCon today about this.

@StefanScherer
Copy link
Author

Maybe @jstarks finds a minute after DockerCon to give us advise if it's fine to merge portainer PR now if go-winio doesn't need an update itself.

@deviantony
Copy link

Any update on this?

@AndreLouisCaron
Copy link

I just stumbled upon this issue in hashicorp/terraform-provider-docker#58.

I hit this on Windows 10 using Docker for Windows. It does not only happen when running inside a container.

I wrote a stress test for the Win32 named pipes API and managed to reproduce the FILE_NOT_FOUND error in the calls to both CreateFile() and WaitNamedPipe(). See this gist for the source code. It's Python rather than Go, but you should be able to recognize the error handling pattern.

Long story short: there is a race condition in the Win32 named pipe API where the CreateFile() and WaitNamedPipe() calls on the client can be invoked while the server has no outstanding calls on ConnectNamedPipe() (e.g. while it's dispatching a connection). The BSD-style sockets API solved this using a listen backlog, but there doesn't seem to be an equivalent in the Win32 named pipes API.

I'm pretty sure the client needs to handle the FILE_NOT_FOUND error in both calls with a short (ideally randomized) pause and a retry to stay resilient to that race condition. That's what I ended up doing in my stress test and it became very stable afterward.

@AndreLouisCaron
Copy link

I submitted a pull request with a tentative fix for this issue. The test suite in hashicorp/terraform-provider-docker#58 now passes reliably when run with a build containing this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants