Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error ⚠️ #5282

Merged
merged 58 commits into from
Feb 2, 2024

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Jan 30, 2024

What do these changes do?

Currently if the message_handler raises an exception, the message is queued again to be retried.
While this behaviour is useful, in some situations is results being decremental.

Issues with the current design:

  • the message is retired immediately
  • if message_handler always raises the message will be retried forever and can overwhelm the system
  • does not allow for handlers to fail and stop after X failed attempts

Changes

Added dead letter exchange box rabbit_client.subscribe internal queue, which helps to deal with errors.

  • ✨ added unexpected_error_retry_delay_s which delays the message by a certain amount before queuing it again in the main queue
  • ✨ added unexpected_error_max_attempts which allows to limit how many times the message_handler is allowed to raise an error before giving up
  • ✅ if message_handler returns False forever, the message will always be queued immediately to the main queue and will be retried as soon as possible (did not change previous behaviour)

IMG_7518

Related issue/s

How to test

Dev Checklist

DevOps Checklist ⚠️

  • ⚠️ NOTE: queue signatures in rabbitmq have changed. services which rely on rabbit_cliebt.subscribe will no longer be able to start. We need to remove the queues (volumes) from rabbitmq in oder for the services to start

Why not have an automatic way to fix this?

@GitHK GitHK self-assigned this Jan 30, 2024
@GitHK GitHK added the a:services-library issues on packages/service-libs label Jan 30, 2024
@GitHK GitHK added this to the This is Sparta! milestone Jan 30, 2024
@GitHK GitHK changed the title ✨ add extra options to rabbit queues ✨ add extra options to rabbit_client.subscribe Jan 30, 2024
Copy link

codecov bot commented Jan 30, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (7821b90) 77.9% compared to head (489b408) 87.1%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #5282      +/-   ##
=========================================
+ Coverage    77.9%   87.1%    +9.2%     
=========================================
  Files        1029    1316     +287     
  Lines       40683   53839   +13156     
  Branches      765    1173     +408     
=========================================
+ Hits        31700   46932   +15232     
+ Misses       8833    6657    -2176     
- Partials      150     250     +100     
Flag Coverage Δ
integrationtests 63.8% <ø> (?)
unittests 85.1% <97.2%> (+7.2%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ls-library/src/models_library/rabbitmq_messages.py 89.7% <ø> (ø)
...y/src/servicelib/aiohttp/monitor_slow_callbacks.py 100.0% <100.0%> (ø)
...service-library/src/servicelib/rabbitmq/_client.py 100.0% <100.0%> (ø)
.../service-library/src/servicelib/rabbitmq/_utils.py 97.0% <75.0%> (ø)

... and 455 files with indirect coverage changes

@GitHK GitHK marked this pull request as ready for review January 30, 2024 10:16
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts still on the necessity of this change.

  1. so if a message is never delivered, we need to look for the exception in graylog right? what about the dead end letter box pattern thingy? did you check that?
  2. what is the use-case where the handler would constantly fail?
  3. what is a good number for the failures? and you are saying there is no delay, how will this work?

@GitHK
Copy link
Contributor Author

GitHK commented Jan 30, 2024

I have some doubts still on the necessity of this change.

  1. so if a message is never delivered, we need to look for the exception in graylog right? what about the dead end letter box pattern thingy? did you check that?

I did not manage to get it working right, but I think I did not do it right. Might still end up requiring it, since it looks like to the goto way to implement a delayed message delivery.

EDIT:
I managed to have the dead letter box pattern working. And with this one I can implement the delay approach. Will try to bring something in using this one now.

  1. what is the use-case where the handler would constantly fail?

My use case is the following: I emit messages for some small task, that I know can fail sometimes, due to the state of the system. I want it to actually retry 0 times and give up.

  1. what is a good number for the failures? and you are saying there is no delay, how will this work?

Regarding the no delay: I've seen 1000+ fails in one second on my machine. I think it's a bad idea and could lead to some strange situations or performance going down.

Regarding the retry policy, I would put something similar to what tenacity does with the exponential backoff and some max duration.
EDIT: exponential backoff is a bit harder to achieve, for now we have a fixed delay and a maximum number of retries, after which an error is logged when the handler gives up on the message.


@pcrespov @sanderegg I have also updated the description of the PR and this is the new design.

After the new change. A message will always be retried a set number of times and it is given up if the error keeps raising unexpected errors.

NOTE: if the handler returns (False) the message will be send to the original queue, not the one handling the errors. So previous behaviour is unaltered.

A warning message is printed out when the handler fails and the message will be queue once more after a fixed delay.

WARNING  servicelib.rabbitmq._client:_client.py:59 Retry [0/0] for handler '<function test__subscribe_to_failing_message_handler.<locals>._faulty_message_handler at 0x7fdb4b0a9000>', which raised an unexpected error caused by message_id='5b757bf24e734dcdb74940c970646f11'

An error message will be printed out if the message has reached its maximum amount of queued times and the it's given up on trying to handle it.

ERROR    servicelib.rabbitmq._client:_client.py:74 Handler '<function test__subscribe_to_failing_message_handler.<locals>._faulty_message_handler at 0x7fdb4b0a9000>' is giving up on message 'IncomingMessage:{'app_id': None,
 'body_size': 181,
 'cluster_id': '',
 'consumer_tag': 'pytest_consumer_neagu-wkst_3113883_pytest_fake_exchange_qaUauvFYAmzYNIRRwotb',
 'content_encoding': None,
 'content_type': None,
 'correlation_id': None,
 'delivery_mode': <DeliveryMode.NOT_PERSISTENT: 1>,
 'delivery_tag': 1,
 'exchange': 'pytest_fake_exchange_qaUauvFYAmzYNIRRwotb',
 'expiration': None,
 'headers': {},
 'message_id': '5b757bf24e734dcdb74940c970646f11',
 'priority': 0,
 'redelivered': False,
 'reply_to': None,
 'routing_key': '',
 'timestamp': None,
 'type': 'None',
 'user_id': None}' with body 'b'American whole magazine truth stop whose. On traditional measure example sense peace. Would mouth relate own chair.\nTogether range line beyond. First policy daughter need kind miss.''
Traceback (most recent call last):
  File "/home/silenthk/work/pr-osparc-retry-on-error-queue/packages/service-library/src/servicelib/rabbitmq/_client.py", line 54, in _on_message
    if not await message_handler(message.body):
  File "/home/silenthk/work/pr-osparc-retry-on-error-queue/packages/service-library/tests/rabbitmq/test_rabbitmq.py", line 515, in _faulty_message_handler
    raise RuntimeError(msg)
RuntimeError: Always fail. Received message b'American whole magazine truth stop whose. On traditional measure example sense peace. Would mouth relate own chair.\nTogether range line beyond. First policy daughter need kind miss.'

@GitHK GitHK requested review from pcrespov and sanderegg January 31, 2024 12:54
@GitHK GitHK changed the title ✨ add extra options to rabbit_client.subscribe rabbit_client.subscribe no longer fails forever if message_handler is faulty Jan 31, 2024
@GitHK GitHK changed the title rabbit_client.subscribe no longer fails forever if message_handler is faulty rabbit_client.subscribe no longer retries forever if message_handler is faulty Jan 31, 2024
@GitHK GitHK changed the title rabbit_client.subscribe no longer retries forever if message_handler is faulty rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error Jan 31, 2024
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@GitHK GitHK enabled auto-merge (squash) February 2, 2024 10:38
Copy link

sonarqubecloud bot commented Feb 2, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@GitHK GitHK merged commit 3906de5 into ITISFoundation:master Feb 2, 2024
55 checks passed
@GitHK GitHK deleted the pr-osparc-retry-on-error-queue branch February 2, 2024 11:20
@GitHK GitHK changed the title rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error rabbit_client.subscribe no longer retries forever if message_handler always raises unexpected error ⚠️ Feb 2, 2024
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Feb 14, 2024
39 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:services-library issues on packages/service-libs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants