-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.2.0 Error while reading from socket: ('Connection closed by server.',) #1140
Comments
Hi @LucyWengCSS, thanks for the report. I'd be interested to know what selector implementation redis-py chose within your environment. The selectors are responsible for determining the health of a connection. redis-py attempts to choose a selector implementation that's most performant based on what's available in your environment. It looks like you're running this in a web context. Are you running gunicorn or uwsgi? Do you know what worker type you're using? If you're using eventlet, there's a known issue #1136 that seems to be a problem with the eventlet implementation of If you're not sure what worker type you're using or you want to dive deeper, we'd need to figure out what selector type redis-py has chosen for your environment. Running the following one-liner within your web process should tell us: # assumes you have an a redis client instantiated as `r`
>>> r.connection_pool.get_connection('_')._selector
<redis.selector.PollSelector at 0x10f2ca1d0> The classname above (in my case, PollSelector) is what's important. Could you let me know what selector is used on your system? Please make sure to run the one-liner above in the same context as your webserver. |
we are using This is happening in our celery workers also which are using eventlet, I'll try switching to gevent so see if it's related. |
Hi Andy, Thanks for working on the issue. May I ask is there any tests or information we can provide for the issue at present? Thanks again. |
I encountered the two following issues celery/kombu#1018 and celery/kombu#1019 which brought me here: I am using: I have similar exceptions happening (Connection timeout and Broken pipe) when switching from kombu 4.3.0 and redis 2.10.6 to kombu 4.4.0 and redis 3.2.0 (The environment and the other libraries remaining unchanged) On the new redis-py 3.2.0 version here is what I get: <redis.selector.PollSelector object at 0x7f18fce04d30> On the previous version 2.10.6 I get AttributeError: 'Connection' object has no attribute '_selector' |
@thedrow What would really help is creating a way to easily reproduce this issue :) |
BTW this issues seem to be resolved for us by switching to gevent workers on celery. |
I am experiencing that too, using gevent + direct use of redis-py 3.2.1. This is a stripped logic of what I try to do:
The channel rarely gets triggered and it might pass days before the callback should be called. It worked fine with redis-py 2, but now, every exact 1 hour I get the exception:
|
We are also seeing "Error 110 while writing to socket. Connection timed out." when trying to dispatch our Celery tasks. We are not using eventlet. We downgraded to Redis 2.10.6 / Kombu 4.3.0 / Celery 4.2 and our problems went away... |
I'm having the same issue trying to connect using Python from Windows 10 Visual Studio Code and trying to connect to a Docker Container. The code: import redis The error: |
Hi @3ddi and @harrybiddle I still have the issue, how about you? Any updates on your side? |
Hey @alexandre-paroissien, I'm sorry, I gave up and downgraded to Redis 2.10.6 / Kombu 4.3.0 / a forked Celery 4.2 with Python 3.7 support...! |
Hi @alexandre-paroissien, I caught the exception and reconnected. Not very elegant, but works for me till a proper fix will be released |
Ok I confirm I still encounter this issue in the most recent versions of the libraries I tested in a test app with no traffic apart from me, I launched a simple task manually, first time worked, second time gave the following output (and ending up working)
|
@alexandre-paroissien Hey, this is great. Do you happen to have the code for your test app published somewhere? If not, could you do so along with whatever other requirements you have installed (like eventlet/gevent/etc.) |
@alexandre-paroissien I created a simple Celery app to hopefully track down what's going on. You can view it here: https://github.com/andymccurdy/celery-test I'm installing it within a virtualenv with only the dependencies listed in Thus far I haven't seen any "Connection to Redis lost" type messages in the Celery logs. I adjusted my Redis server's Can you help figure out what's different in your test environment? |
I'm not using eventlet nor gevent Ubuntu 18 (Heroku-18) I'll try to reproduce the issue in a test app this weekend |
same problem here, redis ver 3.2.1
after 20 hours, error raised:
|
Both of your errors look like the TCP connection between the client and server was disconnected. This can happen for a variety reasons outside the control of redis-py or the Redis server. Enabling TCP keepalive may help. You could also catch the error within your python code and reconnect to the server. If you're only seeing these errors after upgrading to redis-py 3.1 or later, there was a bug in redis-py 2.x and 3.0.x that attempted to auto-reconnect when a ConnectionError was encountered. This caused these network errors to be hidden from users and could occasionally lead to data loss (missed pubsub messages, etc.) |
All: I've put together a patch that uses nonblocking sockets to test the health of connections. This patch completely removes the usage of selectors. I'm hoping this works better with gevent, eventlet and other async selectors. I'd appreciate any help in testing this patch in different environments. The patch is in the "nonblocking" branch here: https://github.com/andymccurdy/redis-py/tree/nonblocking |
I still have the issue celery: 4.2.1
|
We have the same issue with:
redis-py 3.0.1 works without errors. Stacktrace:
These errors are very unstable. We can't reproduce them in a test case. Additional data from stack trace (Sentry):
|
Fix issue redis#1140 - reconnect on ConnectionError while executing command
Adding another voice to the mix here, we upgraded to py-redis 3.2.1 yesterday and ran into this issue with logs of ConnectionErrors showing up in our logs. We need the ZPOP functionality added in 3.x so we downgraded to 3.0.1 and are no longer seeing the issue. I think that change mentioned above in 3.1 is what broke this. FWIW, we aren't using pubsub at all, we were experiencing the error on normal redis commands. We are running in AWS Lambda against ElastiCache redis (through GhostTunnel) using SSL. |
Everyone: Prior to 3.1.0, redis-py would retry any command that raised a Automatically reconnecting and re-issuing a command can potentially lead to data loss or data duplication. In the event of a I don't have a good solution to resolve this but I'm open to ideas and suggestions. |
Why are there so many ConnectionError's? That seems more of a root cause than whether we enable retry-on-error or not. |
I suspect it’s several things:
|
Hi all, I tried the celery-test project (from @andymccurdy github repo) directly connected to redis server on localhost. everything is ok, without any error, also after some idletime. so I configured a test environment with three docker containers:
all three containers are connected through a docker network. every 30 seconds celery worker shows this error (while consuming messages or when in idle):
As attachment: I'm trying to address this issue from redis/haproxy side, like adding tcpkeepalive in haproxy. |
Hi all, |
@marcomezzaro This is great info, thanks. It furthers my suspicion that these errors are the result of network services dropping connections, such as when they are idle for some period. Do you happen to have a docker-compose.yml file for the celery-test/haproxy/redis-server setup? If you do, could you post it? I'd like to experiment a bit more. |
Hi, docker-compose up and just wait 30 seconds, you will see the error stacktrace. if you change in file "celeryconfig.py" the broker_url/backend from haproxy to redis you will see no errors. Let me know if you have any idea. |
@marcomezzaro Thanks! This is very helpful. I can finally reproduce the issue. I'm working on a fix for this here: https://github.com/andymccurdy/redis-py/tree/ping-health-checks The good news is that I believe I have this fixed for workloads that don't need pubsub. Extending this concept to pubsub requires a little more code, but I think I'm close and should have something tomorrow. The bad news is that celery's implementation bypasses a lot of the pubsub flow. They've created their own socket poller that looks for activity on the socket rather than asking the redis-py API if a message is available. This means that even once the pubsub health check works, celery won't be regularly invoking it. Once our implementation is in place perhaps we can get a patch into celery to take advantage of the health check. |
@thedrow Could you elaborate on "...it's a blocking client."? The default ConnectionPool is non-blocking. |
I just finished the code and tests for redis-py health checks. My intent is to merge this over the weekend or early next week. A new redis-py release will be made at that time. In the mean time you can find the branch here: https://github.com/andymccurdy/redis-py/tree/ping-health-checks This patch introduces a new option: I recommend setting this option to a value less than the idle connection timeout value in the target system. For example, if you know that idle TCP connections are killed after 30 seconds in your environment then set the This option also works on any PubSub connection that is created from a client with Some advanced PubSub use cases don't regularly call For Celery users, this change won't automatically fix ConnectionErrors encountered by Celery. Celery uses PubSub in a non-standard way which can not take advantage of the automatic health checks at this time. Once this code is released, we should be able to create a PR for Celery to regularly call If anyone has time to help test this in their own systems I would greatly appreciate it. CC @thedrow |
Version 3.3.0 has been released and is available on PyPI. The |
Any reason this issue is still open? The issue on Celery where I think this will be discussed is: celery/kombu#1019. (I'm not savvy enough to do the fix, but I can at least help connect dots.) |
I've kept this open in case anyone wanted to report back success/failure trying out the 3.3.x health checks. |
It looks like r.connection_pool.get_connection('_')._selector and was confused why it wasn't working in |
I'm getting the following with version
Downgrading to
I'm trying to avoid downgrading to 2.10.6 as I loose functionality like using this as a context manager. |
@shimk52 If you look at the traceback for both exceptions when running 3.5.3 you'll see that the client timed out attempting to connect to the server. This seems more like an issue with your client machine's connectivity to the server or the server itself. |
@andymccurdy thank you for your reply! Say the problem is with my client, which all it does is get and set to redis, after of course initiating a |
@shimk52 Have you tried the |
@andymccurdy Thank you for helping. |
Great, closing this as it has gone through several iterations of various issues. If anyone is still having issues that have to do with any part of this thread, please open a new issue. Thanks! |
@marcomezzaro I have a setup of the form: [celery worker, redis-py-client] --> [haproxy] --> [redis-master].
Any suggestions will be appreciated. cc : @andymccurdy |
Hello Everyone, I am getting below error while performing the insert operation in azure cache redis. File "/home/fmlstream/lsh/lshmodelpipeline/pipelines/lsh_pipeline.py", line 438, in _save_lsh\n lsh_name.insert(name_dict, batch)\n File "/home/fmlstream/lsh/lshmodelpipeline/lsh/lsh_insert.py", line 27, in insert\n logging.warn('{}: {}'.format(str(e), key))\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/lsh.py", line 317, in exit\n self.close()\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/lsh.py", line 320, in close\n self.lsh.keys.empty_buffer()\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/storage.py", line 1010, in empty_buffer\n self._buffer.execute()\n File "/home/fmlstream/lshmodelvenv/lib/python3.6/site-packages/redis/client.py", line 3437, in execute\n self.shard_hint)\n File "/home/fmlstream/lshmodelvenv/lib/python3.6/site-packages/rediscluster/connection.py", line 196, in get_connection\n raise RedisClusterException("Only 'pubsub' commands can be used by get_connection()")\nrediscluster.exceptions.RedisClusterException: Only 'pubsub' commands can be used by get_connection() any help will be much appreciated. @andymccurdy thanks, |
Noting that using redis-py 3.5.3 with health_check_interval=30 and having haproxy timeout set to 60m, I still see the issue (see following traceback) every hour in log files: 2022-01-12 16:16:03,305 [42] ERROR gnocchi.cli.metricd: Error while listening for new measures notification, retrying |
When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
* Update puppet-tripleo from branch 'master' to e6b8f34049a9ab28c535dd6e291f36a7b3d2d5ef - Merge "Increase connection timeouts for Redis" - Increase connection timeouts for Redis When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76 (cherry picked from commit 209e954)
When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76 (cherry picked from commit 209e954)
When using py-redis for connecting to Redis via HAProxy the connection is being closed even when alive by HAProxy. Unfortunately this is a know issue on py-redis side (see [1]). This patch increases connection timeouts to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every 2 minutes. [1] redis/redis-py#1140 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373 Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76 (cherry picked from commit 209e954)
Version:
Python: 3.6.7
Redis: 3.2.7 (Azure Redis)
Redis-py: 3.2.0
Django: 2.1.1
Description:
Hi Experts,
Our service met a similar issue with the issue #1127 3.1.0 causing intermittent connection closed by server error. By reviewing the whole discussion of the issue #1127, we upgraded redis-py to the version 3.2.0 and the issue has been mitigated but still happening. Due to the Azure Redis server will close the connections which are idle more than 10 mins, and the default redis-py behavior is to not close connections, recycling them when possible, could you please suggest how to avoid the exception "Redis ConnectionError: Error while reading from socket: ('Connection closed by server.',)" on our product?
Configuration settings:
{'default': {'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://xx.x.x.xxx:6379/0',
'TIMEOUT': 60,
'OPTIONS': {'DB': 0,
'SOCKET_TIMEOUT': 120,
'SOCKET_CONNECT_TIMEOUT': 30,
'COMPRESSOR': 'django_redis.compressors.zlib.ZlibCompressor',
'IGNORE_EXCEPTIONS': True,
'REDIS_CLIENT_KWARGS': {'socket_keepalive': True},
'PASSWORD': 'xxxxxxxxxxxxxxx='}},
'cachalot': {'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://xx.x.x.xxx:6379/1',
'TIMEOUT': 60,
'OPTIONS': {'DB': 1,
'SOCKET_TIMEOUT': 120,
'SOCKET_CONNECT_TIMEOUT': 30,
'COMPRESSOR': 'django_redis.compressors.zlib.ZlibCompressor',
'IGNORE_EXCEPTIONS': True,
'REDIS_CLIENT_KWARGS': {'socket_keepalive': True},
'PASSWORD': 'xxxxxxxxxxxxxxxxxxxxxx='}},
}
Error Message:
Redis ConnectionError: Error while reading from socket: ('Connection
closed by server.',)
Thanks a lot.
The text was updated successfully, but these errors were encountered: