You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a few options, we could pick a specialized priority job queue library (such as Kue or monq), we can build a job queue abstraction over a message queue library/platform such a Amazon SQS, RabbitMQ, ZeroMQ etc), or we can refine our own queue to handle multiple consumers (the queue use for processing sync POST).
Acceptance Criteria
The queue mechanism we chose to work with needs:
Consumers - to listen in on the queue and grab messages as they come in. We'll want the consumers to be run in multiple individual processes in order to process many sync messages concurrently.
Atomic - you don't want multiple consumers pulling down and processing the same message multiple times. When a consumer acknowledges a message, no other consumers should be able to process it.
Speed - Items need to be inserted into the queue quickly (as to not hold up the sync). Equally, the need to be pulled out on the queue quickly. The actual processing time may vary but the queue needs to handle large numbers of reads/writes.
Error handling - if a consumer dies mid-process, the sync message it was holding should be returned to the queue for another consumer to pick up.
Analysis
Specialized library
Kue:
Kue is a priority job queue backed by redis (rather than our existing server-side data storage platform, mongodb).
Pros:
Rich API for handling errors, including retries with delayed back off or exponential back off support.
Various logging options baked in allowing us to expose information at any point in the jobs lifetime.
graceful shutdown - sync workers can be told to stop processing on completing the active job, or, in the case of timeout or error, will mark the active job as failed.
Rich web UI to visualize the queue - can be mounted inside our express app (and secured with our existing oauth tokens). See http://www.screenr.com/oyNs for demo.
Allows each dyno to control how many jobs will be processed concurrently on a single worker (for instance, based on memory utilization etc).
Backed by mongodb (our existing server-side data storage)
Allows for mongodb TTL (time-to-live) indexes - *should* allow us to expire queued job periodically (and potentially expire previous jobs for a given auth token such that a user only has only sync per session).
basic support for error handling with retries and delayed retries
Cons:
No ui
Lacking documentation (except for test coverage which is pretty good)
No explicit logging hooks (although it shouldn't be too hard to add our own logging)
No out-of-the-box support for handling a process terminating during the processing of a job
Small user base
refine our own queue implementation
To cover the acceptance criteria, we'd need to make a few changes to our queue to correctly handle multiple consumers.
Atomic operations:
We can handle the atomic requirement by using mongodbs findAndModify command in conjunction with adding a new "inProg" boolean to the message envelope. Since findAndModify obtains a write-lock on the affected database (blocking other operations until it has completed), we can safely track in-progress jobs and prevent multiple consumers picking up the same job.
To support findAndModify, we'd have to remove our current usage of tailable cursors and replace with polling as these features cannot be used together at the moment see: https://jira.mongodb.org/browse/SERVER-11753
Error Handling
If/when the heroku process is unexpectedly terminated while a message is being processed, we can attempt to clean up (listening in for SIGTERM, then setting inprog to false - or marking the packet with error: shutdown), however this isn't foolproof. In the case of unexpected termination, a job may be left in a stale state (in prog = true) with no corresponding worker.
As part of the findAndModify call, we can set a timestamp so that we know when the message was pulled down. Based on how long we expect consumers to take per message, we can query documents which are marked as in progress but the timestamp was some time ago.
Using this technique, we can update those documents to reset the "inProg" flag, effectively returning them back to the queue for another consumer to work on, or we can mark the document with an error.
The text was updated successfully, but these errors were encountered:
Mingle Card: 2441
Description
We have a few options, we could pick a specialized priority job queue library (such as Kue or monq), we can build a job queue abstraction over a message queue library/platform such a Amazon SQS, RabbitMQ, ZeroMQ etc), or we can refine our own queue to handle multiple consumers (the queue use for processing sync POST).
Acceptance Criteria
The queue mechanism we chose to work with needs:
Analysis
Specialized library
Kue:
Kue is a priority job queue backed by redis (rather than our existing server-side data storage platform, mongodb).
Pros:
Cons:
monq:
Monq is a MongoDB-backed job queue.
Pros:
Cons:
refine our own queue implementation
To cover the acceptance criteria, we'd need to make a few changes to our queue to correctly handle multiple consumers.
We can handle the atomic requirement by using mongodbs findAndModify command in conjunction with adding a new "inProg" boolean to the message envelope. Since findAndModify obtains a write-lock on the affected database (blocking other operations until it has completed), we can safely track in-progress jobs and prevent multiple consumers picking up the same job.
To support findAndModify, we'd have to remove our current usage of tailable cursors and replace with polling as these features cannot be used together at the moment see: https://jira.mongodb.org/browse/SERVER-11753
If/when the heroku process is unexpectedly terminated while a message is being processed, we can attempt to clean up (listening in for SIGTERM, then setting inprog to false - or marking the packet with error: shutdown), however this isn't foolproof. In the case of unexpected termination, a job may be left in a stale state (in prog = true) with no corresponding worker.
As part of the findAndModify call, we can set a timestamp so that we know when the message was pulled down. Based on how long we expect consumers to take per message, we can query documents which are marked as in progress but the timestamp was some time ago.
Using this technique, we can update those documents to reset the "inProg" flag, effectively returning them back to the queue for another consumer to work on, or we can mark the document with an error.
The text was updated successfully, but these errors were encountered: