Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Sometimes worker can not connect to Redis #329

Open
amercader opened this issue Jul 26, 2019 · 0 comments
Open

Sometimes worker can not connect to Redis #329

amercader opened this issue Jul 26, 2019 · 0 comments
Labels

Comments

@amercader
Copy link
Member

Reported on the fricitonlessdata gitter channel. The symptom is that jobs created via the API (eg in try.goodtables.io or openspending.org/packager/provide-data are never finished, and the client keeps making requests to get an update until you get the following error:

Error: Reached Maximum Attempts

Looking at the pods and the logs, the following from the worker container points to the problem:

kubectl -n production logs goodtables-io-production-xxxxx-yyyy -c goodtables-worker -f
[2019-07-26 09:49:37,169: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:19 exited with 'signal 9 (SIGKILL)'
[2019-07-26 09:49:37,184: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
    human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
[2019-07-26 09:50:34,453: ERROR/ForkPoolWorker-6] There was an error setting the GitHub status: https://api.github.com/repos/bzg/data-codes-sources-fr/statuses/42f134bb64848b2df3e0318822805fc24b45afd1 {"message":"Not Found","documentation_url":"https://developer.github.com/v3/repos/statuses/#create-a-status"} 406 success
[2019-07-26 09:52:53,348: ERROR/MainProcess] Process 'ForkPoolWorker-6' pid:566 exited with 'signal 9 (SIGKILL)'
[2019-07-26 09:52:53,362: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
    human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
^[[A[2019-07-26 10:17:58,438: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 129, in _read_from_socket
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 2165, in _execute
    return command(*args)
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 577, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 238, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 168, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 143, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 439, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 494, in _connect
    raise err
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 482, in _connect
    sock.connect(socket_address)
TimeoutError: [Errno 110] Operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 584, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line 88, in asynloop
    next(loop)
  File "/usr/local/lib/python3.5/site-packages/kombu/async/hub.py", line 354, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 1040, in on_readable
    self.cycle.on_readable(fileno)
  File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 337, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 667, in _receive
    ret.append(self._receive_one(c))
  File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 678, in _receive_one
    response = c.parse_response()
  File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 2183, in parse_response
    return self._execute(connection, connection.read_response)
  File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 2172, in _execute
    connection.connect()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 442, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to goodtables-io-production-redis-master:6379. Operation timed out.

Previous to this there are some errors about files not found, not sure if they are related:

[2019-07-26 09:41:28,823: ERROR/ForkPoolWorker-4] Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, 'No such file or directory')
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/celery/utils/dispatch/signal.py", line 167, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/srv/app/goodtablesio/signals.py", line 40, in task_postrun
    os.remove(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptduxzzma/invalid.csv'
[2019-07-26 09:41:28,835: ERROR/ForkPoolWorker-4] ["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, 'No such file or directory')", '  File "celery/utils/dispatch/signal.py", line 167, in send', '  File "goodtablesio/signals.py", line 40, in task_postrun']
[2019-07-26 09:41:28,835: WARNING/ForkPoolWorker-4] b'["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, \'No such file or directory\')", \'  File "celery/utils/dispatch/signal.py", line 167, in send\', \'  File "goodtablesio/signals.py", line 40, in task_postrun\']'
[2019-07-26 09:41:28,919: ERROR/ForkPoolWorker-1] Sentry responded with an API error: RateLimited(Creation of this event was denied due to rate limiting)
[2019-07-26 09:41:28,919: WARNING/ForkPoolWorker-1] b'Sentry responded with an API error: RateLimited(Creation of this event was denied due to rate limiting)'
[2019-07-26 09:41:28,920: ERROR/ForkPoolWorker-1] ["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, 'No such file or directory')", '  File "celery/utils/dispatch/signal.py", line 167, in send', '  File "goodtablesio/signals.py", line 40, in task_postrun']
[2019-07-26 09:41:28,920: WARNING/ForkPoolWorker-1] b'["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, \'No such file or directory\')", \'  File "celery/utils/dispatch/signal.py", line 167, in send\', \'  File "goodtablesio/signals.py", line 40, in task_postrun\']'

Update: The issue seems to be fixed, I restarted both pods, the app + worker one and the redis one, using the command in #317

@roll roll changed the title Worker can not connect to Redis Sometimes worker can not connect to Redis Jul 28, 2019
@roll roll added the bug label Jul 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants