Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic webhook retries feature #1492

Closed
liangyuanruo opened this issue Mar 30, 2021 · 3 comments · Fixed by #2093
Closed

Automatic webhook retries feature #1492

liangyuanruo opened this issue Mar 30, 2021 · 3 comments · Fixed by #2093
Assignees
Labels
P2 planned for next 1-2 months reliability tech design

Comments

@liangyuanruo
Copy link
Contributor

liangyuanruo commented Mar 30, 2021

This issue is a catch-all for the automatic webhook retries feature

The ability to automatically attempt retries is a useful feature that can greatly reduce the amount of recovery work necessary for downstream webhook consumers. This can generally be implemented using a message queue with a visibility timeout, such as AWS SQS.

Design document

The design document should address the following:

  1. Determine the queue message format, keeping an eye on future extensibility, such as retries with attachments
  2. Identify and addressing risks to the system, such as potential for cascade failure
  3. Clear communication to users of expectations and conditions under which retries will take place, to be spelled out in guide.form.gov.sg
  4. Describe the configuration of both the messaging queue as well as the dead letter queue (DLQ)
  5. Key metrics for monitoring for alerts, such as the number of messages in the DLQ
  6. Recovery procedures for managing dropped messages
@r00dgirl
Copy link
Contributor

r00dgirl commented May 3, 2021

is this closed? @mantariksh

@mantariksh
Copy link
Contributor

@syan-syan yup but there isn't an issue open for the implementation yet, so we can either convert this issue to include implementation or open a new one

@karrui
Copy link
Contributor

karrui commented May 31, 2021

Mentioned the relevant PRs to this issue for future reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 planned for next 1-2 months reliability tech design
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants