Timeout to docker host causes containers to stop #2338

KamilKopaczyk · 2015-11-06T11:05:18Z

Before 1.5.0, from time to time, python stack trace was popping up.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/lib/python2.7/site-packages/compose/cli/multiplexer.py", line 41, in _enqueue_output
    for item in generator:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/lib/python2.7/site-packages/compose/cli/log_printer.py", line 59, in _make_log_generator
    for line in line_generator:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/lib/python2.7/site-packages/compose/cli/utils.py", line 100, in split_buffer
    for data in reader:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/vendor/lib/python2.7/site-packages/docker/clientbase.py", line 238, in _stream_raw_result
    for out in response.iter_content(chunk_size=1, decode_unicode=True):
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/vendor/lib/python2.7/site-packages/requests/utils.py", line 332, in stream_decode_response_unicode
    for item in iterator:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/vendor/lib/python2.7/site-packages/requests/models.py", line 680, in generate
    raise ConnectionError(e)
ConnectionError: HTTPSConnectionPool(host='192.168.99.104', port=2376): Read timed out.

In 1.5.0 this stack trace doesn't show up anymore, but all containers are stopped instead. Even though the target docker machine is running.

Reverted to 1.4.2 and it works fine.

The text was updated successfully, but these errors were encountered:

dnephin · 2015-11-06T15:51:32Z

Can you provide an example of the command and output?

Where is the engine running? virtualbox or a cloud instance somewhere?

KamilKopaczyk · 2015-11-09T12:15:41Z

Hi,

i'm using OSX (El Capitan) with virtual machine set up on virtualbox.
It's only happening after

docker-compose up

the core of the problem is described in #812 , but the what happens next, depends on compose's version

Using 1.4.2 exception with stack is printed and that's all. Containers are still up and running .
Using 1.5.0 containers are just stopped. Output is identical as if you stopped all containers using

docker-compose stop

nbap · 2015-11-19T21:21:11Z

Hi, I'm also facing this issue.
Before 1.5.1 I used to receive the following error 'ConnectionError: HTTPSConnectionPool(host='192.168.99.104', port=2376): Read timed out. but the containers were still running and responsive.

Now that I've updated to 1.5.1 the containers graceful stop after 1 min, round, in every single try and then print the following error: ERROR: Couldn't connect to Docker daemon - you might need to run docker-machine start default.

I'm using OSX 10.11.1
Docker 1.9
Compose 1.5.1
Virtualbox 5.0.8 r103449

dnephin · 2015-11-19T21:25:59Z

I think a workaround for now is to run with docker-compose up -d. That way connection failures won't stop any containers. You can still tail logs with docker-compose logs.

nbap · 2015-11-19T21:35:08Z

@dnephin It works! Thank you.

veloxy · 2015-11-24T21:19:14Z

Same issue here on a newly installed mac

OSX 10.10.4 (14E46)
Virtualbox 5.0.10 r104061

Docker compose

docker-compose version: 1.5.1
docker-py version: 1.5.0
CPython version: 2.7.6
OpenSSL version: OpenSSL 0.9.8zf 19 Mar 2015

Docker

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.5.1
 Git commit:   a34a1d5
 Built:        Sat Nov 21 00:48:57 UTC 2015
 OS/Arch:      darwin/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

Running with -d works as @dnephin mentioned.

If you need any more info, i'll be glad to provide it.

ChrisCinelli · 2015-12-01T02:19:23Z

This does not happen only with Python.

I have a contained with SOLR. When it is enabled (uncommented) in docker-compose.yml, after a 2 or 3 minutes I get the "Gracefully...[list of containers]" message with ERROR: Couldn't connect to Docker daemon - you might need to run docker-machine start default. However docker-machine is still running because I have another docker-compose running in another terminal.

I am using Docker-compose 1.5.0 and Docker 1.9.0

Running with -d works as @dnephin mentioned but the collateral effect is that the logs are gone.

Two questions:

What is causing the timeout to docker host?
Can we have an option to prevent the all composition to shutdown when it happens?

nbap · 2015-12-01T09:05:23Z

@ChrisCinelli After running with -d you can simply run:

docker-compose logs; while [ $? -ne 0 ]; do docker-compose logs; done;

and you will have the logs just like you're used to.
This line will give you the logs for the orchestrated container(s) and will keep them (logs) alive if it loses connection with docker engine.

ChrisCinelli · 2015-12-01T10:03:46Z

Thanks @nbap. I also just saw @dnephin's comment about docker-compose logs.
This will do it for now but I hope the problem will be fixed sooner than later.

I am also curious to know what cause the timeout. This happens only on 2 containers build by people in the same team. It seems there is a pattern.

ux · 2015-12-03T16:04:50Z

In my case Read timed out error happens just if pseudo-TTY option is enabled and no output is provided during COMPOSE_HTTP_TIMEOUT period:

tty: true

If it's not enabled then everythings works as expected.

Docker compose version information:

docker-compose version 1.6.0dev, build 707281a
docker-py version: 1.5.0
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013

Digging deeper I noticed that problem seems to be in the docker-py client. I cannot be sure about that, since I'm not very familiar with docker-compose internals.
So, for a response stream docker-py disables timeout on the underlying socket - see https://github.com/docker/docker-py/blob/1.5.0/docker/client.py#L247-L250, but this doesn't happen if pseudo-TTY is enabled - see https://github.com/docker/docker-py/blob/1.5.0/docker/client.py#L291-L293 and https://github.com/docker/docker-py/blob/1.5.0/docker/client.py#L273-L277.

So, disabling the timeout if pseudo-TTY enabled solves the above mentioned problem - https://gist.github.com/ux/ac4fd45392aedb380903. I don't know if it's a correct way to solve this problem, but it works for me.

dnephin · 2015-12-03T18:45:25Z

I don't think we want to remove the timeout. Timeouts are good. We should however add a retry so that if we hit the timeout it re-establishes the connection.

ux · 2015-12-03T18:53:24Z

Or it's good to somehow differentiate between no output and real connection timeout. On other hand timeout is disabled on multiplexed data.

dave-tucker · 2015-12-03T19:11:27Z

My case is the same as @ux's. I also agree we should distinguish between no output and a real connection timeout. The best option, IMO, would be to remove the timeout if pseudo tty is enabled and to use some form of keepalive to make sure that the connection is just inactive as opposed to timed out.

dnephin · 2015-12-03T19:51:22Z

and to use some form of keepalive to make sure that the connection is just inactive as opposed to timed out.

We might be able to set keep alive on the TCP socket, I'm not that familiar with it. If we do that, we'll still want to keep the timeout though.

ux · 2015-12-04T14:59:43Z

Well, I'm still thinking that it's not bad to disable timeout. Regarding to this timeouts documentation http://docs.python-requests.org/en/latest/user/advanced/#timeouts there are connect and read timeouts. docker-py sets the same timeout value for both - connect and read by invoking _set_request_timeout method - https://github.com/docker/docker-py/blob/master/docker/client.py#L107. So, in either case we have already set connect timeout. With stream enabled and interactive mode (pseudo-TTY enabled) read timeout doesn't really make sense, since interactive is not really about receiving data all the time. At this step - https://github.com/docker/docker-py/blob/master/docker/client.py#L249 - we have already established connection, so that line only disables read timeout. Also, please take a look at the following documentation from https://docs.python.org/2/library/socket.html#socket.socket.settimeout:

Note that the connect() operation is subject to the timeout setting, and in general it is recommended to call settimeout() before calling connect() or pass a timeout parameter to create_connection(). The system network stack may return a connection timeout error of its own regardless of any Python socket timeout setting.

Further, it's good idea to enable TCP keep-alive - once in a certain period of time probe packet with no data will be sent to make sure that connection is still alive and it doesn't disappeared accidentally. That's the point how to find out if something is wrong with a connection. No output provided is not the reason for decision that something is wrong with connection.
More about TCP keep-alive - https://delog.wordpress.com/2013/08/16/handling-tcp-keepalive/, http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#whatis.

So, just to summarize - connect timeout is left as is, disable read timeout and enable TCP keep-alive.

nhooey · 2015-12-07T23:41:49Z

@ux's point seems reasonable.

Will this bug's resolution get scheduled in the next milestone?

dnephin · 2015-12-07T23:51:20Z

Since there is an easy workaround (using -d) I don't think it's high priority for us, but a pull request which adds the keepalive would be appreciated.

dcharbonnier · 2015-12-07T23:53:54Z

there is a workaround to keep containers alive with -d but this :
docker-compose logs; while [ $? -ne 0 ]; do docker-compose logs; done; is not a valid workaround, you duplicate the logs and it's a major issue for me

nhooey · 2015-12-08T00:01:55Z

I agree with @dcharbonnier, it's not a good workaround.

It's also not easy to find this workaround when a user first encounters the terminating instances and the "Couldn't connect to Docker daemon" error in the latest Docker Compose, so many users will be getting confused and wasting time.

But a pull request would be great.

aanand · 2015-12-08T15:07:59Z

Just a note - as of 1.5.2 (specifically, be5b7b6), we don't stop the containers when we encounter a timeout (or an error of any kind) - only when we get SIGINT or SIGTERM.

It's still an issue that we detach, but at least we're not stopping the containers any more.

This option let's us view stdout print statements but causes timeouts with docker-compose. It can be enabled momentarily when debugging until a fix is found. docker/compose#2338

dnephin · 2016-03-14T20:26:23Z

I'm going to close this issue as a duplicate of #3106 since it has a more concise problem description and only describes the subset of the problem that we're still facing . The "containers get shutdown" side of the problem was fixes a few releases ago.

Please follow along in that issue if you're interested.

docker/compose#3927 docker/compose#3106 docker/compose#2338 docker/docker-py#630

dnephin added the kind/bug label Nov 6, 2015

dnephin mentioned this issue Jan 4, 2016

"Couldn't connect to Docker daemon" after running for a while #2592

Closed

dnephin mentioned this issue Feb 16, 2016

[1.6.0] cannot run against swarm in attached mode #2857

Closed

dnephin mentioned this issue Feb 24, 2016

Connection to Docker Daemon Drop #3016

Closed

dnephin added the duplicate label Mar 14, 2016

dnephin closed this as completed Mar 14, 2016

dnephin mentioned this issue Mar 14, 2016

docker-compose up hangs when tty parameter is true #3106

Closed

tedmiston added a commit to astronomer/airflow that referenced this issue Sep 8, 2017

Add docker timeouts for tty

89e8674

docker/compose#3927 docker/compose#3106 docker/compose#2338 docker/docker-py#630

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout to docker host causes containers to stop #2338

Timeout to docker host causes containers to stop #2338

KamilKopaczyk commented Nov 6, 2015

dnephin commented Nov 6, 2015

KamilKopaczyk commented Nov 9, 2015

nbap commented Nov 19, 2015

dnephin commented Nov 19, 2015

nbap commented Nov 19, 2015

veloxy commented Nov 24, 2015

ChrisCinelli commented Dec 1, 2015

nbap commented Dec 1, 2015

ChrisCinelli commented Dec 1, 2015

ux commented Dec 3, 2015

dnephin commented Dec 3, 2015

ux commented Dec 3, 2015

dave-tucker commented Dec 3, 2015

dnephin commented Dec 3, 2015

ux commented Dec 4, 2015

nhooey commented Dec 7, 2015

dnephin commented Dec 7, 2015

dcharbonnier commented Dec 7, 2015

nhooey commented Dec 8, 2015

aanand commented Dec 8, 2015

dnephin commented Mar 14, 2016

Timeout to docker host causes containers to stop #2338

Timeout to docker host causes containers to stop #2338

Comments

KamilKopaczyk commented Nov 6, 2015

dnephin commented Nov 6, 2015

KamilKopaczyk commented Nov 9, 2015

nbap commented Nov 19, 2015

dnephin commented Nov 19, 2015

nbap commented Nov 19, 2015

veloxy commented Nov 24, 2015

ChrisCinelli commented Dec 1, 2015

nbap commented Dec 1, 2015

ChrisCinelli commented Dec 1, 2015

ux commented Dec 3, 2015

dnephin commented Dec 3, 2015

ux commented Dec 3, 2015

dave-tucker commented Dec 3, 2015

dnephin commented Dec 3, 2015

ux commented Dec 4, 2015

nhooey commented Dec 7, 2015

dnephin commented Dec 7, 2015

dcharbonnier commented Dec 7, 2015

nhooey commented Dec 8, 2015

aanand commented Dec 8, 2015

dnephin commented Mar 14, 2016