3.2.0 Error while reading from socket: ('Connection closed by server.',) #1140

LucyWengCSS · 2019-02-26T07:09:15Z

Version:
Python: 3.6.7
Redis: 3.2.7 (Azure Redis)
Redis-py: 3.2.0
Django: 2.1.1

Description:
Hi Experts,

Our service met a similar issue with the issue #1127 3.1.0 causing intermittent connection closed by server error. By reviewing the whole discussion of the issue #1127, we upgraded redis-py to the version 3.2.0 and the issue has been mitigated but still happening. Due to the Azure Redis server will close the connections which are idle more than 10 mins, and the default redis-py behavior is to not close connections, recycling them when possible, could you please suggest how to avoid the exception "Redis ConnectionError: Error while reading from socket: ('Connection closed by server.',)" on our product?

Configuration settings:
{'default': {'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://xx.x.x.xxx:6379/0',
'TIMEOUT': 60,
'OPTIONS': {'DB': 0,
'SOCKET_TIMEOUT': 120,
'SOCKET_CONNECT_TIMEOUT': 30,
'COMPRESSOR': 'django_redis.compressors.zlib.ZlibCompressor',
'IGNORE_EXCEPTIONS': True,
'REDIS_CLIENT_KWARGS': {'socket_keepalive': True},
'PASSWORD': 'xxxxxxxxxxxxxxx='}},
'cachalot': {'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://xx.x.x.xxx:6379/1',
'TIMEOUT': 60,
'OPTIONS': {'DB': 1,
'SOCKET_TIMEOUT': 120,
'SOCKET_CONNECT_TIMEOUT': 30,
'COMPRESSOR': 'django_redis.compressors.zlib.ZlibCompressor',
'IGNORE_EXCEPTIONS': True,
'REDIS_CLIENT_KWARGS': {'socket_keepalive': True},
'PASSWORD': 'xxxxxxxxxxxxxxxxxxxxxx='}},
}

Error Message:
Redis ConnectionError: Error while reading from socket: ('Connection
closed by server.',)

Thanks a lot.

andymccurdy · 2019-03-04T20:48:52Z

Hi @LucyWengCSS, thanks for the report. I'd be interested to know what selector implementation redis-py chose within your environment. The selectors are responsible for determining the health of a connection. redis-py attempts to choose a selector implementation that's most performant based on what's available in your environment.

It looks like you're running this in a web context. Are you running gunicorn or uwsgi? Do you know what worker type you're using? If you're using eventlet, there's a known issue #1136 that seems to be a problem with the eventlet implementation of select.

If you're not sure what worker type you're using or you want to dive deeper, we'd need to figure out what selector type redis-py has chosen for your environment. Running the following one-liner within your web process should tell us:

# assumes you have an a redis client instantiated as `r`
>>> r.connection_pool.get_connection('_')._selector
<redis.selector.PollSelector at 0x10f2ca1d0>

The classname above (in my case, PollSelector) is what's important. Could you let me know what selector is used on your system? Please make sure to run the one-liner above in the same context as your webserver.

asyncmind0 · 2019-03-07T03:23:39Z

In [88]: r.connection_pool.get_connection('_')._selector
    ...:
Out[88]: <redis.selector.PollSelector at 0x7fc6f1d73390>

we are using uwsgi==2.0.18 with gevent==1.4.0

This is happening in our celery workers also which are using eventlet, I'll try switching to gevent so see if it's related.

LucyWengCSS · 2019-03-14T03:14:44Z

Hi Andy,

Thanks for working on the issue.

May I ask is there any tests or information we can provide for the issue at present? Thanks again.

alexandre-paroissien · 2019-03-14T05:42:52Z

I encountered the two following issues celery/kombu#1018 and celery/kombu#1019 which brought me here:

I am using:
Ubuntu 18 (Heroku-18)
Python 3.6.8 / Django 2.1.7
Celery 4.2.1
Gunicorn 19.9.0
Redis 3.2.12 (Redis To Go)
You can see more details on celery/kombu#1019
(I am not using eventlet)

I have similar exceptions happening (Connection timeout and Broken pipe) when switching from kombu 4.3.0 and redis 2.10.6 to kombu 4.4.0 and redis 3.2.0 (The environment and the other libraries remaining unchanged)

On the new redis-py 3.2.0 version here is what I get:
import os
import redis
r = redis.from_url(os.environ.get("REDISTOGO_URL"))
r.connection_pool.get_connection('_')._selector

<redis.selector.PollSelector object at 0x7f18fce04d30>

On the previous version 2.10.6 I get AttributeError: 'Connection' object has no attribute '_selector'

andymccurdy · 2019-03-18T19:54:11Z

@thedrow What would really help is creating a way to easily reproduce this issue :)

asyncmind0 · 2019-03-19T23:14:03Z

BTW this issues seem to be resolved for us by switching to gevent workers on celery.

3ddi · 2019-03-24T20:44:45Z

I am experiencing that too, using gevent + direct use of redis-py 3.2.1. This is a stripped logic of what I try to do:

pubsub = redis_client.pubsub()
pubsub.subscribe(**{KEY: new_task_callback})
while True:
    for message in pubsub.listen():
        ...

The channel rarely gets triggered and it might pass days before the callback should be called. It worked fine with redis-py 2, but now, every exact 1 hour I get the exception:

  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 398, in read_response
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

harrybiddle · 2019-03-25T13:39:16Z

We are also seeing "Error 110 while writing to socket. Connection timed out." when trying to dispatch our Celery tasks. We are not using eventlet. We downgraded to Redis 2.10.6 / Kombu 4.3.0 / Celery 4.2 and our problems went away...

DecisionSystems · 2019-04-12T01:49:02Z

I'm having the same issue trying to connect using Python from Windows 10 Visual Studio Code and trying to connect to a Docker Container.

The code:

import redis
try:
conn = redis.StrictRedis(
host='127.0.0.1',
port=6379)
print(conn)
conn.ping()
print('Connected!')
except Exception as ex:
print('Error:', ex)
exit('Failed to connect, terminating.')

The error:
Redis<ConnectionPool<Connection<host=127.0.0.1,port=6379,db=0>>>
Error: Error while reading from socket: ('Connection closed by server.',)
Failed to connect, terminating.

alexandre-paroissien · 2019-05-07T08:28:40Z

Hi @3ddi and @harrybiddle I still have the issue, how about you? Any updates on your side?

harrybiddle · 2019-05-07T12:04:23Z

Hey @alexandre-paroissien, I'm sorry, I gave up and downgraded to Redis 2.10.6 / Kombu 4.3.0 / a forked Celery 4.2 with Python 3.7 support...!

3ddi · 2019-05-07T18:53:45Z

Hi @alexandre-paroissien, I caught the exception and reconnected. Not very elegant, but works for me till a proper fix will be released

alexandre-paroissien · 2019-05-22T04:02:16Z

Ok I confirm I still encounter this issue in the most recent versions of the libraries
celery 4.3.0, kombu 4.5.0, redis 3.2.1

I tested in a test app with no traffic apart from me, I launched a simple task manually, first time worked, second time gave the following output (and ending up working)


2019-05-21T07:51:02.022118+00:00 app[worker.1]: [2019-05-21 07:51:02,021: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (0/20) now.
2019-05-21T07:51:02.024750+00:00 app[worker.1]: [2019-05-21 07:51:02,024: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (1/20) in 1.00 second.
2019-05-21T07:51:03.028332+00:00 app[worker.1]: [2019-05-21 07:51:03,028: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (2/20) in 1.00 second.
2019-05-21T07:51:04.032513+00:00 app[worker.1]: [2019-05-21 07:51:04,032: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (3/20) in 1.00 second.
2019-05-21T07:51:05.037741+00:00 app[worker.1]: [2019-05-21 07:51:05,037: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (4/20) in 1.00 second.
2019-05-21T07:51:06.041513+00:00 app[worker.1]: [2019-05-21 07:51:06,041: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (5/20) in 1.00 second.
2019-05-21T07:51:07.045367+00:00 app[worker.1]: [2019-05-21 07:51:07,045: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (6/20) in 1.00 second.
2019-05-21T07:51:08.048339+00:00 app[worker.1]: [2019-05-21 07:51:08,048: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (7/20) in 1.00 second.
2019-05-21T07:51:09.052390+00:00 app[worker.1]: [2019-05-21 07:51:09,052: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (8/20) in 1.00 second.

andymccurdy · 2019-05-22T23:35:48Z

@alexandre-paroissien Hey, this is great. Do you happen to have the code for your test app published somewhere? If not, could you do so along with whatever other requirements you have installed (like eventlet/gevent/etc.)

andymccurdy · 2019-05-23T21:32:21Z

@alexandre-paroissien I created a simple Celery app to hopefully track down what's going on. You can view it here: https://github.com/andymccurdy/celery-test

I'm installing it within a virtualenv with only the dependencies listed in requirements.txt.

Thus far I haven't seen any "Connection to Redis lost" type messages in the Celery logs. I adjusted my Redis server's timeout to 1 second in hopes of seeing connections break but everything seemed to work just fine.

Can you help figure out what's different in your test environment?

alexandre-paroissien · 2019-05-24T01:31:48Z

I'm not using eventlet nor gevent

Ubuntu 18 (Heroku-18)
Python 3.6 / Django 2.2 / Django Rest framework
Celery 4.2.1 or 4.3 and Django Celery Beat
Gunicorn 19.9.0
Redis 3.2.12 (Redis To Go)

I'll try to reproduce the issue in a test app this weekend

JustinhoCHN · 2019-05-29T06:10:00Z

same problem here, redis ver 3.2.1

def connect_to_redis():
    pool = redis.ConnectionPool(max_connections=100, host=args.ip, port=args.port, db=args.db, password=args.pwd)
    r = redis.Redis(connection_pool=pool)
    return r

redis_client = connect_to_redis()
sub_client = redis_client.pubsub(ignore_subscribe_messages=True)
sub_client.subscribe(args.channel)
while True:
    res = sub_client.get_message()
    if res:
        # do something with res
    time.sleep(0.001)

after 20 hours, error raised:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 398, in read_response
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/server/faiss_add.py", line 75, in <module>
    add_message = sub_client.get_message()
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3135, in get_message
    response = self.parse_response(block=False, timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3036, in parse_response
    return self._execute(connection, connection.read_response)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3013, in _execute
    return command(*args)
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 409, in read_response
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

andymccurdy · 2019-06-01T23:45:11Z

@3ddi @JustinhoCHN

Both of your errors look like the TCP connection between the client and server was disconnected. This can happen for a variety reasons outside the control of redis-py or the Redis server. Enabling TCP keepalive may help. You could also catch the error within your python code and reconnect to the server.

If you're only seeing these errors after upgrading to redis-py 3.1 or later, there was a bug in redis-py 2.x and 3.0.x that attempted to auto-reconnect when a ConnectionError was encountered. This caused these network errors to be hidden from users and could occasionally lead to data loss (missed pubsub messages, etc.)

andymccurdy · 2019-06-01T23:49:56Z

All:

I've put together a patch that uses nonblocking sockets to test the health of connections. This patch completely removes the usage of selectors. I'm hoping this works better with gevent, eventlet and other async selectors.

I'd appreciate any help in testing this patch in different environments. The patch is in the "nonblocking" branch here: https://github.com/andymccurdy/redis-py/tree/nonblocking

NullYing · 2019-06-10T01:26:28Z

I still have the issue

celery: 4.2.1
redis: 3.2.1

ConnectionError: Error 104 while writing to socket. Connection reset by peer.
  File "kombu/connection.py", line 431, in _reraise_as_library_errors
    yield
  File "celery/app/base.py", line 755, in send_task
    self.backend.on_task_call(P, task_id)
  File "celery/backends/redis.py", line 294, in on_task_call
    self.result_consumer.consume_from(task_id)
  File "celery/backends/redis.py", line 136, in consume_from
    self._consume_from(task_id)
  File "celery/backends/redis.py", line 142, in _consume_from
    self._pubsub.subscribe(key)
  File "redis/client.py", line 3096, in subscribe
    ret_val = self.execute_command('SUBSCRIBE', *iterkeys(new_channels))
  File "redis/client.py", line 3009, in execute_command
    self._execute(connection, connection.send_command, *args)
  File "redis/client.py", line 3013, in _execute
    return command(*args)
  File "redis/connection.py", line 620, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "redis/connection.py", line 613, in send_packed_command
    (errno, errmsg))
ConnectionResetError: [Errno 104] Connection reset by peer
  File "redis/connection.py", line 600, in send_packed_command
    self._sock.sendall(item)

harmant · 2019-06-16T12:40:11Z

We have the same issue with:

redis-py: 3.2.1
redis: 5.0.3
retry_on_timeout: True

redis-py 3.0.1 works without errors.

Stacktrace:

File "redis/client.py", line 1264, in get
    return self.execute_command('GET', name)
  File "redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "redis/connection.py", line 409, in read_response
    (e.args,))

These errors are very unstable. We can't reproduce them in a test case.

Additional data from stack trace (Sentry):

'_buffer_cutoff': 6000
'_parser': <redis.connection.HiredisParser object at 0x7f167c418790>
'_selector': <redis.selector.PollSelector object at 0x7f167c425310>
'_sock': None
'encoder': <redis.connection.Encoder object at 0x7f167c418810>
'retry_on_timeout': True
'socket_connect_timeout': None
'socket_keepalive': None
'socket_keepalive_options': 
'socket_timeout': None
'socket_type': 0

…ommand

Fix issue redis#1140 - reconnect on ConnectionError while executing command

jmc-rival · 2019-06-20T06:57:47Z

Adding another voice to the mix here, we upgraded to py-redis 3.2.1 yesterday and ran into this issue with logs of ConnectionErrors showing up in our logs. We need the ZPOP functionality added in 3.x so we downgraded to 3.0.1 and are no longer seeing the issue. I think that change mentioned above in 3.1 is what broke this.

FWIW, we aren't using pubsub at all, we were experiencing the error on normal redis commands.

We are running in AWS Lambda against ElastiCache redis (through GhostTunnel) using SSL.

andymccurdy · 2019-06-20T07:41:29Z

Everyone:

Prior to 3.1.0, redis-py would retry any command that raised a ConnectionError exception. This was a bug and never intended. This bug was fixed in 3.1.0.

Automatically reconnecting and re-issuing a command can potentially lead to data loss or data duplication. In the event of a ConnectionError the client does not know whether a command made it to the server. It also does not know whether the server executed the command or not. If the client automatically retries a command like RPUSH, the value being pushed may end up in the list twice. Similarly, if the client automatically retries a command like LPOP the first two elements of the list might be popped off but the client only receives the second one.

I don't have a good solution to resolve this but I'm open to ideas and suggestions.

rotten · 2019-06-20T10:44:16Z

Why are there so many ConnectionError's? That seems more of a root cause than whether we enable retry-on-error or not.

andymccurdy · 2019-06-20T15:34:49Z

I suspect it’s several things:

Some people have the timeout setting configured on the Redis server.
Some network equipment closes idle TCP connections.

marcomezzaro · 2019-07-03T21:21:30Z

Hi all,
As @andymccurdy said, there is something between redis server and celery worker that drop the connection after some time.

I tried the celery-test project (from @andymccurdy github repo) directly connected to redis server on localhost. everything is ok, without any error, also after some idletime.

so I configured a test environment with three docker containers:

celery-test (from @andymccurdy github repo)
haproxy
redis server

all three containers are connected through a docker network.
haproxy is configured in tcp mode.

every 30 seconds celery worker shows this error (while consuming messages or when in idle):

consumer: Connection to broker lost. Trying to re-establish the connection... Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/redis/connection.py", line 185, in _read_from_socket raise socket.error(SERVER_CLOSED_CONNECTION_ERROR) OSError: Connection closed by server.
...
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

As attachment:
haproxy.txt
redis-cli_config_get.txt

I'm trying to address this issue from redis/haproxy side, like adding tcpkeepalive in haproxy.

marcomezzaro · 2019-07-22T08:12:42Z

Hi all,
I also tried to exclude celery and kombu and start with python redis library test.
With the same three containers celery-test, haproxy, redis I tried to execute raw redis python client as described in https://github.com/andymccurdy/redis-py#getting-started
A simple while(1) loop that write and read a key from redis.
There are no timeout error.
If haproxy or redis is turned off redis client goes into exception because is not handled.

andymccurdy · 2019-07-22T17:20:33Z

@marcomezzaro This is great info, thanks. It furthers my suspicion that these errors are the result of network services dropping connections, such as when they are idle for some period.

Do you happen to have a docker-compose.yml file for the celery-test/haproxy/redis-server setup? If you do, could you post it? I'd like to experiment a bit more.

marcomezzaro · 2019-07-23T12:37:18Z

Hi,
@andymccurdy I've forked your repo and I've added docker-compose file.
https://github.com/marcomezzaro/celery-test/tree/docker-compose

docker-compose up and just wait 30 seconds, you will see the error stacktrace.

if you change in file "celeryconfig.py" the broker_url/backend from haproxy to redis you will see no errors.

Let me know if you have any idea.

andymccurdy · 2019-07-24T07:35:18Z

@marcomezzaro Thanks! This is very helpful. I can finally reproduce the issue.

I'm working on a fix for this here: https://github.com/andymccurdy/redis-py/tree/ping-health-checks

The good news is that I believe I have this fixed for workloads that don't need pubsub. Extending this concept to pubsub requires a little more code, but I think I'm close and should have something tomorrow.

The bad news is that celery's implementation bypasses a lot of the pubsub flow. They've created their own socket poller that looks for activity on the socket rather than asking the redis-py API if a message is available. This means that even once the pubsub health check works, celery won't be regularly invoking it.

Once our implementation is in place perhaps we can get a patch into celery to take advantage of the health check.

okomarov · 2019-07-24T10:16:00Z

@thedrow Could you elaborate on "...it's a blocking client."? The default ConnectionPool is non-blocking.

andymccurdy · 2019-07-27T00:21:24Z

I just finished the code and tests for redis-py health checks. My intent is to merge this over the weekend or early next week. A new redis-py release will be made at that time. In the mean time you can find the branch here: https://github.com/andymccurdy/redis-py/tree/ping-health-checks

This patch introduces a new option: health_check_interval. By default, health_check_interval=0 which disables health checks. To enable health checks, set health_check_interval to a positive integer indicating the number of seconds that a connection can be idle before a health check is performed. For example, health_check_interval=30 will ensure that a health check is run on any connection that's been idle for 30 or more seconds just before a command is executed on that connection.

I recommend setting this option to a value less than the idle connection timeout value in the target system. For example, if you know that idle TCP connections are killed after 30 seconds in your environment then set the health_check_interval to 20-25 seconds.

This option also works on any PubSub connection that is created from a client with health_check_interval enabled. PubSub users just need to ensure that get_message() or listen() are called more frequently than health_check_interval seconds. I assume most workloads are already doing this.

Some advanced PubSub use cases don't regularly call get_message() or listen(). In these cases, the user must call pubsub.check_health() explicitly.

For Celery users, this change won't automatically fix ConnectionErrors encountered by Celery. Celery uses PubSub in a non-standard way which can not take advantage of the automatic health checks at this time. Once this code is released, we should be able to create a PR for Celery to regularly call pubsub.check_health().

If anyone has time to help test this in their own systems I would greatly appreciate it.

CC @thedrow

andymccurdy · 2019-07-28T21:15:35Z

Version 3.3.0 has been released and is available on PyPI. The health_check_interval option is included.

mlissner · 2019-08-05T20:34:23Z

Any reason this issue is still open? The issue on Celery where I think this will be discussed is: celery/kombu#1019.

(I'm not savvy enough to do the fix, but I can at least help connect dots.)

andymccurdy · 2019-08-05T20:48:03Z

I've kept this open in case anyone wanted to report back success/failure trying out the 3.3.x health checks.

dtran320 · 2019-08-08T21:50:16Z

It looks like _selector was removed in this commit if anyone else tried to run

r.connection_pool.get_connection('_')._selector

and was confused why it wasn't working in 3.3.6.

shimk52 · 2020-06-30T15:10:49Z

I've kept this open in case anyone wanted to report back success/failure trying out the 3.3.x health checks.

I'm getting the following with version 3.5.3 and kombu == 4.2.1:

2020-06-27 11:54:37,625 p8-t140655154120448 ERROR : Error 110 connecting to Redis-EU2.redis.cache.windows.net:6379. Connection timed out.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 559, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 851, in _connect
    sock = super(SSLConnection, self)._connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 615, in _connect
    raise err
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 603, in _connect
    sock.connect(socket_address)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1801, in set
    return self.execute_command('SET', *pieces)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 898, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 1192, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 563, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to Redis-EU2.redis.cache.windows.net:6379. Connection timed out.

Downgrading to 3.3.0 gave the following error:

2020-06-30 15:02:26,351 p6-t140649877714688 ERROR: The operation did not complete (read) (_ssl.c:2309)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1519, in set
    return self.execute_command('SET', *pieces)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 836, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 1049, in get_connection
    if connection.can_read():
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 666, in can_read
    return self._parser.can_read(timeout)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 280, in can_read
    return self._buffer and self._buffer.can_read(timeout)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 190, in can_read
    raise_on_timeout=False)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 159, in _read_from_socket
    data = recv(self._sock, socket_read_size)
  File "/usr/local/lib/python3.6/site-packages/redis/_compat.py", line 58, in recv
    return sock.recv(*args, **kwargs)
  File "/usr/local/lib/python3.6/ssl.py", line 997, in recv
    return self.read(buflen)
  File "/usr/local/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.6/ssl.py", line 633, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2309)

I'm trying to avoid downgrading to 2.10.6 as I loose functionality like using this as a context manager.
I will probably add a reconnect logic to avoid this issue.

andymccurdy · 2020-06-30T18:20:02Z

@shimk52 If you look at the traceback for both exceptions when running 3.5.3 you'll see that the client timed out attempting to connect to the server. This seems more like an issue with your client machine's connectivity to the server or the server itself.

shimk52 · 2020-07-02T11:53:10Z

@andymccurdy thank you for your reply!
I don't think this is related to the Redis server itself, as it happens in different environments where each environment uses a different redis instance.

Say the problem is with my client, which all it does is get and set to redis, after of course initiating a Redis instance with only host and password params.
How would one add a reconnect logic to redis?
From what I read in the docs, you are using a connection pool and a connection is created per request, meaning ping() and then re-init Redis is redundant from what I understand.
If there is a known issue or something that I can contribute to, please let me know.

andymccurdy · 2020-07-13T00:26:04Z

@shimk52 Have you tried the health_check_interval option? Try setting health_check_interval=N, where N is the maximum number of idle seconds that a connection can remain connected without checking its own health. A health check includes a roundtrip ping/pong. If that check fails, the redis-py attempts to reestablish the connection exactly once. If the health check fails an error is raised. Otherwise things proceed as expected.

shimk52 · 2020-07-13T05:28:37Z

@andymccurdy Thank you for helping.
I have finally found the issue, is was a bad port when connecting to Redis Azure (AWS works fine with default port).
All looks good now, using 3.5.3 with kombu == 4.2.1.

andymccurdy · 2020-07-13T05:39:32Z

Great, closing this as it has gone through several iterations of various issues. If anyone is still having issues that have to do with any part of this thread, please open a new issue. Thanks!

mohit-chawla · 2020-08-12T07:18:47Z

@marcomezzaro I have a setup of the form: [celery worker, redis-py-client] --> [haproxy] --> [redis-master].
And i am sporadically facing this error

 File "/home/mohit/.local/lib/python3.7/site-packages/celery/result.py", line 387, in __del__
    self.backend.remove_pending_result(self)
  File "/home/mohit/.local/lib/python3.7/site-packages/celery/backends/asynchronous.py", line 175, in remove_pending_result
    self.on_result_fulfilled(result)
  File "/home/mohit/.local/lib/python3.7/site-packages/celery/backends/asynchronous.py", line 183, in on_result_fulfilled
    self.result_consumer.cancel_for(result.id)
  File "/home/mohit/.local/lib/python3.7/site-packages/celery/backends/redis.py", line 148, in cancel_for
    self._pubsub.unsubscribe(key)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/client.py", line 3280, in unsubscribe
    return self.execute_command('UNSUBSCRIBE', *args)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/client.py", line 3155, in execute_command
    self._execute(connection, connection.send_command, *args, **kwargs)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/client.py", line 3159, in _execute
    return command(*args, **kwargs)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/connection.py", line 687, in send_command
    check_health=kwargs.get('check_health', True))
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/connection.py", line 679, in send_packed_command
    (errno, errmsg))
**redis.exceptions.ConnectionError: Error 32 while writing to socket. Broken pipe.**

Any suggestions will be appreciated. cc : @andymccurdy

Shivakumar2602 · 2021-06-15T04:15:27Z

Hello Everyone,

I am getting below error while performing the insert operation in azure cache redis.

File "/home/fmlstream/lsh/lshmodelpipeline/pipelines/lsh_pipeline.py", line 438, in _save_lsh\n lsh_name.insert(name_dict, batch)\n File "/home/fmlstream/lsh/lshmodelpipeline/lsh/lsh_insert.py", line 27, in insert\n logging.warn('{}: {}'.format(str(e), key))\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/lsh.py", line 317, in exit\n self.close()\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/lsh.py", line 320, in close\n self.lsh.keys.empty_buffer()\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/storage.py", line 1010, in empty_buffer\n self._buffer.execute()\n File "/home/fmlstream/lshmodelvenv/lib/python3.6/site-packages/redis/client.py", line 3437, in execute\n self.shard_hint)\n File "/home/fmlstream/lshmodelvenv/lib/python3.6/site-packages/rediscluster/connection.py", line 196, in get_connection\n raise RedisClusterException("Only 'pubsub' commands can be used by get_connection()")\nrediscluster.exceptions.RedisClusterException: Only 'pubsub' commands can be used by get_connection()

any help will be much appreciated. @andymccurdy

thanks,
Shiva

paramite · 2022-01-12T16:44:06Z

Noting that using redis-py 3.5.3 with health_check_interval=30 and having haproxy timeout set to 60m, I still see the issue (see following traceback) every hour in log files:

2022-01-12 16:16:03,305 [42] ERROR gnocchi.cli.metricd: Error while listening for new measures notification, retrying
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/gnocchi/cli/metricd.py", line 186, in _fill_sacks_to_process
for sack in self.incoming.iter_on_sacks_to_process():
File "/usr/lib/python3.6/site-packages/gnocchi/incoming/redis.py", line 199, in iter_on_sacks_to_process
for message in p.listen():
File "/usr/lib/python3.6/site-packages/redis/client.py", line 3605, in listen
response = self.handle_message(self.parse_response(block=True))
File "/usr/lib/python3.6/site-packages/redis/client.py", line 3505, in parse_response
response = self._execute(conn, conn.read_response)
File "/usr/lib/python3.6/site-packages/redis/client.py", line 3479, in _execute
return command(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 739, in read_response
response = self._parser.read_response()
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 324, in read_response
raw = self._buffer.readline()
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 256, in readline
self._read_from_socket()
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 201, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.

When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76

* Update puppet-tripleo from branch 'master' to e6b8f34049a9ab28c535dd6e291f36a7b3d2d5ef - Merge "Increase connection timeouts for Redis" - Increase connection timeouts for Redis When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76

When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76 (cherry picked from commit 209e954)

alexandre-paroissien mentioned this issue Mar 13, 2019

"Error 110 while writing to socket. Connection timed out." With kombu 4.4.0/4.5.0 and redis 3.2.0/3.2.1 celery/kombu#1019

Closed

ziollek added a commit to ziollek/redis-py that referenced this issue Jun 18, 2019

Fix issue redis#1140 - reconnect on ConnectionError while executing c…

716d085

…ommand

ziollek mentioned this issue Jun 19, 2019

Fix issue #1140 - reconnect on ConnectionError while executing command #1175

Closed

ziollek added a commit to ziollek/redis-py that referenced this issue Jun 19, 2019

Merge pull request #1 from ziollek/fix_issue_1140

a31104b

Fix issue redis#1140 - reconnect on ConnectionError while executing command

alexandre-paroissien mentioned this issue Jun 20, 2019

"Error 32 while writing to socket. Broken pipe" After upgrading to kombu 4.4.0 and redis 3.2.0 celery/kombu#1018

Closed

This was referenced Dec 20, 2019

ability to specify health_check_interval for redis sebleier/django-redis-cache#184

Closed

ability to specify health_check_interval for redis jazzband/django-redis#425

Closed

frost-nzcr4 mentioned this issue Feb 6, 2020

Handle Redis connection errors in result consumer celery/celery#5919

Closed

4 tasks

andymccurdy closed this as completed Jul 13, 2020

thomasst mentioned this issue Apr 12, 2021

Support unpatched processes closeio/redis-hashring#14

Merged

3.2.0 Error while reading from socket: ('Connection closed by server.',) #1140

3.2.0 Error while reading from socket: ('Connection closed by server.',) #1140

Comments

LucyWengCSS commented Feb 26, 2019

andymccurdy commented Mar 4, 2019

asyncmind0 commented Mar 7, 2019 • edited Loading

LucyWengCSS commented Mar 14, 2019

alexandre-paroissien commented Mar 14, 2019 • edited Loading

andymccurdy commented Mar 18, 2019

asyncmind0 commented Mar 19, 2019

3ddi commented Mar 24, 2019

harrybiddle commented Mar 25, 2019 • edited Loading

DecisionSystems commented Apr 12, 2019

alexandre-paroissien commented May 7, 2019

harrybiddle commented May 7, 2019

3ddi commented May 7, 2019 • edited Loading

alexandre-paroissien commented May 22, 2019

andymccurdy commented May 22, 2019

andymccurdy commented May 23, 2019

alexandre-paroissien commented May 24, 2019

JustinhoCHN commented May 29, 2019

andymccurdy commented Jun 1, 2019

andymccurdy commented Jun 1, 2019

NullYing commented Jun 10, 2019

harmant commented Jun 16, 2019 • edited Loading

jmc-rival commented Jun 20, 2019

andymccurdy commented Jun 20, 2019

rotten commented Jun 20, 2019

andymccurdy commented Jun 20, 2019

marcomezzaro commented Jul 3, 2019

marcomezzaro commented Jul 22, 2019

andymccurdy commented Jul 22, 2019

marcomezzaro commented Jul 23, 2019

andymccurdy commented Jul 24, 2019 • edited Loading

okomarov commented Jul 24, 2019

andymccurdy commented Jul 27, 2019

andymccurdy commented Jul 28, 2019

mlissner commented Aug 5, 2019

andymccurdy commented Aug 5, 2019

dtran320 commented Aug 8, 2019

shimk52 commented Jun 30, 2020

andymccurdy commented Jun 30, 2020

shimk52 commented Jul 2, 2020

andymccurdy commented Jul 13, 2020

shimk52 commented Jul 13, 2020

andymccurdy commented Jul 13, 2020

mohit-chawla commented Aug 12, 2020 • edited Loading

Shivakumar2602 commented Jun 15, 2021

paramite commented Jan 12, 2022

asyncmind0 commented Mar 7, 2019 •

edited

Loading

alexandre-paroissien commented Mar 14, 2019 •

edited

Loading

harrybiddle commented Mar 25, 2019 •

edited

Loading

3ddi commented May 7, 2019 •

edited

Loading

harmant commented Jun 16, 2019 •

edited

Loading

andymccurdy commented Jul 24, 2019 •

edited

Loading

mohit-chawla commented Aug 12, 2020 •

edited

Loading