✨ `rabbit_client.subscribe` no longer retries forever if `message_handler` always raises unexpected error ⚠️ #5282

GitHK · 2024-01-30T10:00:55Z

What do these changes do?

Currently if the message_handler raises an exception, the message is queued again to be retried.
While this behaviour is useful, in some situations is results being decremental.

Issues with the current design:

the message is retired immediately
if message_handler always raises the message will be retried forever and can overwhelm the system
does not allow for handlers to fail and stop after X failed attempts

Changes

Added dead letter exchange box rabbit_client.subscribe internal queue, which helps to deal with errors.

✨ added unexpected_error_retry_delay_s which delays the message by a certain amount before queuing it again in the main queue
✨ added unexpected_error_max_attempts which allows to limit how many times the message_handler is allowed to raise an error before giving up
✅ if message_handler returns False forever, the message will always be queued immediately to the main queue and will be retried as soon as possible (did not change previous behaviour)

Related issue/s

How to test

Dev Checklist

No ENV changes or I properly updated ENV (read the instruction)

DevOps Checklist ⚠️

⚠️ NOTE: queue signatures in rabbitmq have changed. services which rely on rabbit_cliebt.subscribe will no longer be able to start. We need to remove the queues (volumes) from rabbitmq in oder for the services to start

Why not have an automatic way to fix this?

read reasons here ✨ added notes with actions to take if rabbitmq queues are no longer created #5295

…n-error-queue

codecov · 2024-01-30T10:04:26Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (7821b90) 77.9% compared to head (489b408) 87.1%.

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #5282      +/-   ##
=========================================
+ Coverage    77.9%   87.1%    +9.2%     
=========================================
  Files        1029    1316     +287     
  Lines       40683   53839   +13156     
  Branches      765    1173     +408     
=========================================
+ Hits        31700   46932   +15232     
+ Misses       8833    6657    -2176     
- Partials      150     250     +100

Flag	Coverage Δ
integrationtests	`63.8% <ø> (?)`
unittests	`85.1% <97.2%> (+7.2%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...ls-library/src/models_library/rabbitmq_messages.py	`89.7% <ø> (ø)`
...y/src/servicelib/aiohttp/monitor_slow_callbacks.py	`100.0% <100.0%> (ø)`
...service-library/src/servicelib/rabbitmq/_client.py	`100.0% <100.0%> (ø)`
.../service-library/src/servicelib/rabbitmq/_utils.py	`97.0% <75.0%> (ø)`

... and 455 files with indirect coverage changes

sanderegg

I have some doubts still on the necessity of this change.

so if a message is never delivered, we need to look for the exception in graylog right? what about the dead end letter box pattern thingy? did you check that?
what is the use-case where the handler would constantly fail?
what is a good number for the failures? and you are saying there is no delay, how will this work?

packages/service-library/src/servicelib/rabbitmq/_client.py

packages/service-library/tests/rabbitmq/test_rabbitmq.py

packages/service-library/src/servicelib/rabbitmq/_client.py

GitHK · 2024-01-30T13:10:23Z

I have some doubts still on the necessity of this change.

so if a message is never delivered, we need to look for the exception in graylog right? what about the dead end letter box pattern thingy? did you check that?

~~I did not manage to get it working right, but I think I did not do it right. Might still end up requiring it, since it looks like to the goto way to implement a delayed message delivery.~~

EDIT:
I managed to have the dead letter box pattern working. And with this one I can implement the delay approach. Will try to bring something in using this one now.

what is the use-case where the handler would constantly fail?

My use case is the following: I emit messages for some small task, that I know can fail sometimes, due to the state of the system. I want it to actually retry 0 times and give up.

what is a good number for the failures? and you are saying there is no delay, how will this work?

Regarding the no delay: I've seen 1000+ fails in one second on my machine. I think it's a bad idea and could lead to some strange situations or performance going down.

Regarding the retry policy, I would put something similar to what tenacity does ~~with the exponential backoff and some max duration.~~
EDIT: exponential backoff is a bit harder to achieve, for now we have a fixed delay and a maximum number of retries, after which an error is logged when the handler gives up on the message.

@pcrespov @sanderegg I have also updated the description of the PR and this is the new design.

After the new change. A message will always be retried a set number of times and it is given up if the error keeps raising unexpected errors.

NOTE: if the handler returns (False) the message will be send to the original queue, not the one handling the errors. So previous behaviour is unaltered.

A warning message is printed out when the handler fails and the message will be queue once more after a fixed delay.

WARNING  servicelib.rabbitmq._client:_client.py:59 Retry [0/0] for handler '<function test__subscribe_to_failing_message_handler.<locals>._faulty_message_handler at 0x7fdb4b0a9000>', which raised an unexpected error caused by message_id='5b757bf24e734dcdb74940c970646f11'

An error message will be printed out if the message has reached its maximum amount of queued times and the it's given up on trying to handle it.

ERROR    servicelib.rabbitmq._client:_client.py:74 Handler '<function test__subscribe_to_failing_message_handler.<locals>._faulty_message_handler at 0x7fdb4b0a9000>' is giving up on message 'IncomingMessage:{'app_id': None,
 'body_size': 181,
 'cluster_id': '',
 'consumer_tag': 'pytest_consumer_neagu-wkst_3113883_pytest_fake_exchange_qaUauvFYAmzYNIRRwotb',
 'content_encoding': None,
 'content_type': None,
 'correlation_id': None,
 'delivery_mode': <DeliveryMode.NOT_PERSISTENT: 1>,
 'delivery_tag': 1,
 'exchange': 'pytest_fake_exchange_qaUauvFYAmzYNIRRwotb',
 'expiration': None,
 'headers': {},
 'message_id': '5b757bf24e734dcdb74940c970646f11',
 'priority': 0,
 'redelivered': False,
 'reply_to': None,
 'routing_key': '',
 'timestamp': None,
 'type': 'None',
 'user_id': None}' with body 'b'American whole magazine truth stop whose. On traditional measure example sense peace. Would mouth relate own chair.\nTogether range line beyond. First policy daughter need kind miss.''
Traceback (most recent call last):
  File "/home/silenthk/work/pr-osparc-retry-on-error-queue/packages/service-library/src/servicelib/rabbitmq/_client.py", line 54, in _on_message
    if not await message_handler(message.body):
  File "/home/silenthk/work/pr-osparc-retry-on-error-queue/packages/service-library/tests/rabbitmq/test_rabbitmq.py", line 515, in _faulty_message_handler
    raise RuntimeError(msg)
RuntimeError: Always fail. Received message b'American whole magazine truth stop whose. On traditional measure example sense peace. Would mouth relate own chair.\nTogether range line beyond. First policy daughter need kind miss.'

…n-error-queue

sanderegg

thanks

sonarqubecloud · 2024-02-02T10:39:04Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Andrei Neagu added 2 commits January 30, 2024 09:54

typo

c3bf9f8

refactor queues

a838810

GitHK self-assigned this Jan 30, 2024

GitHK added the a:services-library issues on packages/service-libs label Jan 30, 2024

GitHK added this to the This is Sparta! milestone Jan 30, 2024

Merge remote-tracking branch 'upstream/master' into pr-osparc-retry-o…

41c56ad

…n-error-queue

GitHK changed the title ~~✨ add extra options to rabbit queues~~ ✨ add extra options to rabbit_client.subscribe Jan 30, 2024

GitHK marked this pull request as ready for review January 30, 2024 10:16

GitHK requested review from pcrespov, sanderegg and matusdrobuliak66 as code owners January 30, 2024 10:16

sanderegg reviewed Jan 30, 2024

View reviewed changes

packages/service-library/src/servicelib/rabbitmq/_client.py Outdated Show resolved Hide resolved

packages/service-library/tests/rabbitmq/test_rabbitmq.py Outdated Show resolved Hide resolved

packages/service-library/src/servicelib/rabbitmq/_client.py Outdated Show resolved Hide resolved

pcrespov reviewed Jan 30, 2024

View reviewed changes

packages/service-library/src/servicelib/rabbitmq/_client.py Outdated Show resolved Hide resolved

packages/service-library/src/servicelib/rabbitmq/_client.py Outdated Show resolved Hide resolved

Andrei Neagu added 8 commits January 30, 2024 14:31

add tests for topics as well

fece3e1

refactor to fail if it cannot recuperate from error state

d546a1d

Merge remote-tracking branch 'upstream/master' into pr-osparc-retry-o…

a92b881

…n-error-queue

mypy

a3abda6

docstring and renaming

3988bc2

added extra test and refactor

7d9c4c7

refactor tests

4cdf899

added more use cases

b1600fe

GitHK requested review from pcrespov and sanderegg January 31, 2024 12:54

GitHK changed the title ~~✨ add extra options to rabbit_client.subscribe~~ ✨ rabbit_client.subscribe no longer fails forever if message_handler is faulty Jan 31, 2024

GitHK changed the title ~~✨ rabbit_client.subscribe no longer fails forever if message_handler is faulty~~ ✨ rabbit_client.subscribe no longer retries forever if message_handler is faulty Jan 31, 2024

GitHK changed the title ~~✨ rabbit_client.subscribe no longer retries forever if message_handler is faulty~~ ✨ rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error Jan 31, 2024

Andrei Neagu added 2 commits January 31, 2024 14:02

allow more time for retries

edc9987

tests no longer hang in ci

d9022a3

Andrei Neagu added 3 commits February 1, 2024 13:43

making tests timeout to debug them

5045801

fix timeout

ce07ad8

Merge remote-tracking branch 'upstream/master' into pr-osparc-retry-o…

69d25b5

…n-error-queue

sanderegg approved these changes Feb 1, 2024

View reviewed changes

Andrei Neagu added 19 commits February 1, 2024 14:18

lower timeout

0f95657

refactor

c16abb7

refactor test

3e51668

refactor

0cb350f

remove not required

840e8b8

renaming tests

b73552b

refactor moving down

43b1721

added prints for debug

24b81cd

put tests back in the middle

899bf8b

refactr

4b4e00e

refactor to add more traces

e13d6eb

refactor

eab7ebb

refactor

0ac41a3

refactor

c1d97cb

trying to rearrange

85f6292

refactor

04cc2b9

refactor

ec10ddf

restore

aee5d45

removed pytest-timeout

489b408

GitHK enabled auto-merge (squash) February 2, 2024 10:38

GitHK merged commit 3906de5 into ITISFoundation:master Feb 2, 2024
55 checks passed

GitHK deleted the pr-osparc-retry-on-error-queue branch February 2, 2024 11:20

GitHK changed the title ~~✨ rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error~~ ✨ rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error ⚠️ Feb 2, 2024

matusdrobuliak66 mentioned this pull request Feb 7, 2024

🚀 Pre-release master -> staging_ThisIsSparta6 #5313

Closed

16 tasks

matusdrobuliak66 mentioned this pull request Feb 14, 2024

🚀 Release v1.65.0 #5226

Closed

39 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ `rabbit_client.subscribe` no longer retries forever if `message_handler` always raises unexpected error ⚠️ #5282

✨ `rabbit_client.subscribe` no longer retries forever if `message_handler` always raises unexpected error ⚠️ #5282

GitHK commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

sanderegg left a comment

GitHK commented Jan 30, 2024 •

edited

Loading

sanderegg left a comment

sonarqubecloud bot commented Feb 2, 2024

✨ rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error ⚠️ #5282

✨ rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error ⚠️ #5282

Conversation

GitHK commented Jan 30, 2024 • edited Loading

What do these changes do?

Changes

Related issue/s

How to test

Dev Checklist

DevOps Checklist ⚠️

Why not have an automatic way to fix this?

codecov bot commented Jan 30, 2024 • edited Loading

Codecov Report

sanderegg left a comment

Choose a reason for hiding this comment

GitHK commented Jan 30, 2024 • edited Loading

sanderegg left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 2, 2024

Quality Gate passed

✨ `rabbit_client.subscribe` no longer retries forever if `message_handler` always raises unexpected error ⚠️ #5282

✨ `rabbit_client.subscribe` no longer retries forever if `message_handler` always raises unexpected error ⚠️ #5282

GitHK commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

GitHK commented Jan 30, 2024 •

edited

Loading