Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent errors handling during the listener recovery #35

Closed
romank0 opened this issue Apr 20, 2023 · 3 comments
Closed

Inconsistent errors handling during the listener recovery #35

romank0 opened this issue Apr 20, 2023 · 3 comments

Comments

@romank0
Copy link
Contributor

romank0 commented Apr 20, 2023

If the listener process is running and it encounters an unprocessable notification that is the notification which processing ends up with the exception such a notification is skipped and the processing continues to the next one.

But the processing during the recovery causes the listener process to exit immediately so if there are more stored notifications that can potentially be processed they are skipped and there is not way to recover automatically (in the sense to process existing "good" notifications and listen to new notifications) except to delete the notification that is unprocessable.

Minimal Fix

I believe the behaviour should be consistent. If we skip errors during listenting we should skip them as well during the recovery and that's what the minimal fix might be.

The recovery here is a misnomer I think as this is just processing of the notifications that were received when the listener was down and that may happen even during the normal processing in the cloud environment where processes might go down without any internal issue.

Configurable errors handling strategy

Maybe it makes sense and something I would love to see is a configuration option that would define a strategy how to handle errors during the procesing.

While skipping notification that is faulty is a good default in some scenarios it might be beneficial to stop processing completely when the error happens in some other scenarios (for example if we have temporary issue in some system that notification processing is using I would like that processing does not skip notifications but instead retry the processing of the oldest unprocessed one continuously).

@PaulGilmartin
Copy link
Owner

Thanks for raising this, this inconsistency is something I've never noticed before. I can try to make the minimal fix as soon as I have time.

For the long term changes in this direction, error handling/retry strategies sound like the way to go. This is also a feature I wanted to add when first creating the library, but never got round to defining the interface. I think at least the strategy would be defined on the channel dataclasses, but outside that I haven't thought about it. This and logging are definitely at the top of the priority list in terms of new features.

@PaulGilmartin
Copy link
Owner

Here's a PR for the minimal fix: #42

If you'd like to take a look before I merge to check if it's doing the right thing, that would be nice. Otherwise I'll just merge into master in a day or so.

@PaulGilmartin
Copy link
Owner

Fixed in https://pypi.org/project/django-pgpubsub/1.0.6/
Error/Retry strategy can be implemented in next larger feature release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants