Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout to docker host causes containers to stop #2338

Closed
KamilKopaczyk opened this issue Nov 6, 2015 · 21 comments
Closed

Timeout to docker host causes containers to stop #2338

KamilKopaczyk opened this issue Nov 6, 2015 · 21 comments

Comments

@KamilKopaczyk
Copy link

Before 1.5.0, from time to time, python stack trace was popping up.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/lib/python2.7/site-packages/compose/cli/multiplexer.py", line 41, in _enqueue_output
    for item in generator:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/lib/python2.7/site-packages/compose/cli/log_printer.py", line 59, in _make_log_generator
    for line in line_generator:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/lib/python2.7/site-packages/compose/cli/utils.py", line 100, in split_buffer
    for data in reader:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/vendor/lib/python2.7/site-packages/docker/clientbase.py", line 238, in _stream_raw_result
    for out in response.iter_content(chunk_size=1, decode_unicode=True):
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/vendor/lib/python2.7/site-packages/requests/utils.py", line 332, in stream_decode_response_unicode
    for item in iterator:
  File "/usr/local/Cellar/docker-compose/1.4.2/libexec/vendor/lib/python2.7/site-packages/requests/models.py", line 680, in generate
    raise ConnectionError(e)
ConnectionError: HTTPSConnectionPool(host='192.168.99.104', port=2376): Read timed out.

In 1.5.0 this stack trace doesn't show up anymore, but all containers are stopped instead. Even though the target docker machine is running.

Reverted to 1.4.2 and it works fine.

@dnephin
Copy link

dnephin commented Nov 6, 2015

Can you provide an example of the command and output?

Where is the engine running? virtualbox or a cloud instance somewhere?

@KamilKopaczyk
Copy link
Author

Hi,

i'm using OSX (El Capitan) with virtual machine set up on virtualbox.
It's only happening after

docker-compose up

the core of the problem is described in #812 , but the what happens next, depends on compose's version

Using 1.4.2 exception with stack is printed and that's all. Containers are still up and running .
Using 1.5.0 containers are just stopped. Output is identical as if you stopped all containers using

docker-compose stop

@nbap
Copy link

nbap commented Nov 19, 2015

Hi, I'm also facing this issue.
Before 1.5.1 I used to receive the following error 'ConnectionError: HTTPSConnectionPool(host='192.168.99.104', port=2376): Read timed out. but the containers were still running and responsive.

Now that I've updated to 1.5.1 the containers graceful stop after 1 min, round, in every single try and then print the following error: ERROR: Couldn't connect to Docker daemon - you might need to run docker-machine start default.

I'm using OSX 10.11.1
Docker 1.9
Compose 1.5.1
Virtualbox 5.0.8 r103449

@dnephin
Copy link

dnephin commented Nov 19, 2015

I think a workaround for now is to run with docker-compose up -d. That way connection failures won't stop any containers. You can still tail logs with docker-compose logs.

@nbap
Copy link

nbap commented Nov 19, 2015

@dnephin It works! Thank you.

@veloxy
Copy link

veloxy commented Nov 24, 2015

Same issue here on a newly installed mac

OSX 10.10.4 (14E46)
Virtualbox 5.0.10 r104061

Docker compose

docker-compose version: 1.5.1
docker-py version: 1.5.0
CPython version: 2.7.6
OpenSSL version: OpenSSL 0.9.8zf 19 Mar 2015 

Docker

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.5.1
 Git commit:   a34a1d5
 Built:        Sat Nov 21 00:48:57 UTC 2015
 OS/Arch:      darwin/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

Running with -d works as @dnephin mentioned.

If you need any more info, i'll be glad to provide it.

@ChrisCinelli
Copy link

This does not happen only with Python.

I have a contained with SOLR. When it is enabled (uncommented) in docker-compose.yml, after a 2 or 3 minutes I get the "Gracefully...[list of containers]" message with ERROR: Couldn't connect to Docker daemon - you might need to run docker-machine start default. However docker-machine is still running because I have another docker-compose running in another terminal.

I am using Docker-compose 1.5.0 and Docker 1.9.0

Running with -d works as @dnephin mentioned but the collateral effect is that the logs are gone.

Two questions:

  1. What is causing the timeout to docker host?
  2. Can we have an option to prevent the all composition to shutdown when it happens?

@nbap
Copy link

nbap commented Dec 1, 2015

@ChrisCinelli After running with -d you can simply run:

docker-compose logs; while [ $? -ne 0 ]; do docker-compose logs; done;

and you will have the logs just like you're used to.
This line will give you the logs for the orchestrated container(s) and will keep them (logs) alive if it loses connection with docker engine.

@ChrisCinelli
Copy link

Thanks @nbap. I also just saw @dnephin's comment about docker-compose logs.
This will do it for now but I hope the problem will be fixed sooner than later.

I am also curious to know what cause the timeout. This happens only on 2 containers build by people in the same team. It seems there is a pattern.

@ux
Copy link

ux commented Dec 3, 2015

In my case Read timed out error happens just if pseudo-TTY option is enabled and no output is provided during COMPOSE_HTTP_TIMEOUT period:

tty: true

If it's not enabled then everythings works as expected.

Docker compose version information:

docker-compose version 1.6.0dev, build 707281a
docker-py version: 1.5.0
CPython version: 2.7.9
OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013

Digging deeper I noticed that problem seems to be in the docker-py client. I cannot be sure about that, since I'm not very familiar with docker-compose internals.
So, for a response stream docker-py disables timeout on the underlying socket - see https://github.com/docker/docker-py/blob/1.5.0/docker/client.py#L247-L250, but this doesn't happen if pseudo-TTY is enabled - see https://github.com/docker/docker-py/blob/1.5.0/docker/client.py#L291-L293 and https://github.com/docker/docker-py/blob/1.5.0/docker/client.py#L273-L277.

So, disabling the timeout if pseudo-TTY enabled solves the above mentioned problem - https://gist.github.com/ux/ac4fd45392aedb380903. I don't know if it's a correct way to solve this problem, but it works for me.

@dnephin
Copy link

dnephin commented Dec 3, 2015

I don't think we want to remove the timeout. Timeouts are good. We should however add a retry so that if we hit the timeout it re-establishes the connection.

@ux
Copy link

ux commented Dec 3, 2015

Or it's good to somehow differentiate between no output and real connection timeout. On other hand timeout is disabled on multiplexed data.

@dave-tucker
Copy link

My case is the same as @ux's. I also agree we should distinguish between no output and a real connection timeout. The best option, IMO, would be to remove the timeout if pseudo tty is enabled and to use some form of keepalive to make sure that the connection is just inactive as opposed to timed out.

@dnephin
Copy link

dnephin commented Dec 3, 2015

and to use some form of keepalive to make sure that the connection is just inactive as opposed to timed out.

We might be able to set keep alive on the TCP socket, I'm not that familiar with it. If we do that, we'll still want to keep the timeout though.

@ux
Copy link

ux commented Dec 4, 2015

Well, I'm still thinking that it's not bad to disable timeout. Regarding to this timeouts documentation http://docs.python-requests.org/en/latest/user/advanced/#timeouts there are connect and read timeouts. docker-py sets the same timeout value for both - connect and read by invoking _set_request_timeout method - https://github.com/docker/docker-py/blob/master/docker/client.py#L107. So, in either case we have already set connect timeout. With stream enabled and interactive mode (pseudo-TTY enabled) read timeout doesn't really make sense, since interactive is not really about receiving data all the time. At this step - https://github.com/docker/docker-py/blob/master/docker/client.py#L249 - we have already established connection, so that line only disables read timeout. Also, please take a look at the following documentation from https://docs.python.org/2/library/socket.html#socket.socket.settimeout:

Note that the connect() operation is subject to the timeout setting, and in general it is recommended to call settimeout() before calling connect() or pass a timeout parameter to create_connection(). The system network stack may return a connection timeout error of its own regardless of any Python socket timeout setting.

Further, it's good idea to enable TCP keep-alive - once in a certain period of time probe packet with no data will be sent to make sure that connection is still alive and it doesn't disappeared accidentally. That's the point how to find out if something is wrong with a connection. No output provided is not the reason for decision that something is wrong with connection.
More about TCP keep-alive - https://delog.wordpress.com/2013/08/16/handling-tcp-keepalive/, http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#whatis.

So, just to summarize - connect timeout is left as is, disable read timeout and enable TCP keep-alive.

@nhooey
Copy link

nhooey commented Dec 7, 2015

@ux's point seems reasonable.

Will this bug's resolution get scheduled in the next milestone?

@dnephin
Copy link

dnephin commented Dec 7, 2015

Since there is an easy workaround (using -d) I don't think it's high priority for us, but a pull request which adds the keepalive would be appreciated.

@dcharbonnier
Copy link

there is a workaround to keep containers alive with -d but this :
docker-compose logs; while [ $? -ne 0 ]; do docker-compose logs; done; is not a valid workaround, you duplicate the logs and it's a major issue for me

@nhooey
Copy link

nhooey commented Dec 8, 2015

I agree with @dcharbonnier, it's not a good workaround.

It's also not easy to find this workaround when a user first encounters the terminating instances and the "Couldn't connect to Docker daemon" error in the latest Docker Compose, so many users will be getting confused and wasting time.

But a pull request would be great.

@aanand
Copy link

aanand commented Dec 8, 2015

Just a note - as of 1.5.2 (specifically, be5b7b6), we don't stop the containers when we encounter a timeout (or an error of any kind) - only when we get SIGINT or SIGTERM.

It's still an issue that we detach, but at least we're not stopping the containers any more.

tonyd256 added a commit to November-Project/tracker-api that referenced this issue Dec 29, 2015
This option let's us view stdout print statements but causes timeouts
with docker-compose. It can be enabled momentarily when debugging until
a fix is found.

docker/compose#2338
@dnephin
Copy link

dnephin commented Mar 14, 2016

I'm going to close this issue as a duplicate of #3106 since it has a more concise problem description and only describes the subset of the problem that we're still facing . The "containers get shutdown" side of the problem was fixes a few releases ago.

Please follow along in that issue if you're interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants