-
Notifications
You must be signed in to change notification settings - Fork 440
Event Subsystem Architecture Review
This review of Events
/ notifications and the respective delayed jobs was conducted in July 2017 by @hennevogel, @mdeniz and @evanrolfe.
- Jobs of the same type do not run concurrently
- Failed procedures notify errbit and can be retried
- Jobs distinguish between procedures which have not yet started, completed or failed
After analyzing the Event
architecture we have come up with 3 possible options for an overhaul.
No more Event
classes, no events
table in the database. We will store the data we need for processing a job inside
the DelayedJob
table (payload). Whenever the event happens (like a package fails to build), jobs are created accordingly.
Same as option 1, but the event related data is still stored in the Event
model. Data get's duplicated to the DelayedJob
payload.
Basically keeping everything as it is. Events get stored into the Event
model. Batches of Events
get processed by many
multi purpose & single purpose jobs. Event
data get's duplicated to the DelayedJob
payload. The only change would be
to run each type of job in an independent queue to avoid concurrency.
- Multipurpose Jobs: One job that does every task (send mail, notify backend) related to the event that happened
- Single purpose Jobs: One job per task (send mail, notify backend) related to the event that happened
-
Store data in the Event’s table: We need to keep the
Event
instances around as long as there are jobs to be processed (state & cleanup) - Store data in the DJ payload: Job related data gets duplicated into the DJ payload
Summary per option:
Option | 1 | 2 | 3 |
---|---|---|---|
events processed per job | One | One | Many |
task per job | One | One | One/Many |
jobs per event | Many | Many | None/Many (CreateJob) |
Table to store data in between | DJ | DJ | Event |
Concurrency | Yes | Yes | No |
Failures handler | DJ | DJ | Events |
Individual Queues | No | No | Yes |
Copies of event data | Duplicated | Duplicated | Normalized |
Event representation | No | Yes | Yes |
Cleanup of Events | No | Yes | Yes |
An overview of the things we have noticed about the different jobs
- None of these jobs can track failures.
- Its assumed that every job will succeed.
- ActiveJob and DelayedJob use different default queues
- Jobs shouldn't expose methods beside
perform
Requirements:
- Is needed to be processing events continously
Target:
- Posts the event payload to the backend, for events that define
raw\_type
attribute. - Only needed for the hermes and rabbitmq backend notification plugins.
Job Creation:
- Clock.rb creates and queues a delayed job every 30 seconds.
- [PROBLEM] This is using DelayedJob directly, not ActiveJob.
Processing control:
- Uses boolean attribute
events
.queued
to keep track of whether or not this has been processed. - [PROBLEM] queued is set to true before the payload is posted
- [PROBLEM] Does not handle failures.
- [PROBLEM]
notify\_backend
method is only defined on Event::Base class.
Concurrency control:
- There is nothing to prevent this job running simultaneously, which is a problem because events can be processed more than one time and being sent to the backend.
Target:
- It saves ProjectLogEntry entries to the database to create the RSS feed for the last commits in projects/packages
- Should be created ASAP
- It only needs project log entries to exist in the database for 10 days.
Job Creation
- Clock.rb creates and enqueu a delayed job every 10 minutes
Processing control:
- Uses the project_logged column.
- [PROBLEM] Continuously retries events which raise an error when creating the ProjectLogEntry, or if anything else goes wrong (i.e. the project was already deleted).
- [PROBLEM] If we reach 10,000 unprocessable events, then that would prevents the valid events from being processed, for 10 days.
- [PROBLEM] Events which dont descend from Event::Project or Event::Package hang around for 10 days before they get marked as logged even though they are never used by ProjectLogRotate.
Concurrency control
- Cannot run simultaneously with another instance of itself.
- We prevent this by running all instances of this job in a single queue with a single worker.
Target:
- CreateJob is base class, the subclasses called are: ** UpdateBackendInfos - Update frontend data based on what comes from the backend ** UpdateReleasedBinaries - Updates BinaryRelease data in frontend based on what comes from the backend
Job Creation:
- DelayedJobs are queued inside
perform\_create\_jobs
callback in Event::Base model - Each job queued increments the undone_jobs counter
- [PROBLEM] This is using DelayedJob directly, not ActiveJob.
Processing control:
- uses undone_jobs (integer) column to keep track of how many delayed jobs still need to be completed
- undone_jobs == 0 means that either there were no jobs to be processed, or they have already been processed
- when a job completes it decrements undone_jobs counter by 1
- [PROBLEM] both jobs do not handle exceptions or failures
Concurrency control:
- CreateJob locks the event while updating undone_jobs after the job is completed
- UpdateReleasedBinaries runs in 'releasetracking' queue so is not concurrent
- UpdateBackendInfos runs in the 'quick' queue so is concurrent
Target:
- Send emails ASAP for events to subscribers
- Create RSS notifications ASAP for events
Job Creation:
- Clock.rb creates and enqueu a delayed job every 30 seconds.
Processing control:
- Uses boolean attribute
events
.mails\_sent
to keep track of whether or not this has been processed. - [PROBLEM]
create\_rss\_notifications
fails silently. - [PROBLEM] It cannot distinguish between single failures in email sending and / or rss notification creation
- If either email sending or rss creation fails: ** then Errbit is notified ** [PROBLEM] mails_sent is set to true to not re-process that event
Concurrency control:
- cannot run simultaneously with another instance of itself.
- we prevent this by running all instances of this job in a single queue with a single worker.
Target:
- It is reading from the backend at /lastnotifications and creating ASAP events based on that response.
Job Creation:
- Clock.rb runs this every 17 seconds inside a thread (because it was needed to run asynchronously).
- [PROBLEM] The use of threads complicates the processing, a Mutex is used to avoid running multiple threads at the same time
Processing control:
- Every run of this job stores the last notification id it looked at into the database (BackendInfo.lastnotification_nr)
- Every run of this job fetches the notifications from BackendInfo.lastnotification_nr onwards
- Every run of this jobs is blocking access to the the backend call??? (Clarify with the backend people what /lastnotifications?block=1 means)
- Based on
limit\_reached
andnext
attributes of backend /lastnotifications response -
limit\_reached
set to 1 means that the backend have more events to notify (> 1000) but it can't be served in one request, so, it would mean that we need to request more from the backend. That will be done in another iteration of the loop. -
sync=lost
will be set if the notification id the job starts off, is lower than the oldest number on record in the backend (probably not needed anymore as concurrent proccesses are not possible anymore)
Concurrency control:
- cannot run simultaneously with another instance of itself.
- [PROBLEM] we prevent this by using a semaphore/Mutex.
- The relationship between events and subscriptions is a complex service class and the logic only works one way. You can only find subscriptions for an event, not the other way round)
- Event's data is duplicated for Notifications and ProjecLogEntry instances as the payload
- Development Environment Overview
- Development Environment Tips & Tricks
- Spec-Tips
- Code Style
- Rubocop
- Testing with VCR
- Authentication
- Authorization
- Autocomplete
- BS Requests
- Events
- ProjectLog
- Notifications
- Feature Toggles
- Build Results
- Attrib classes
- Flags
- The BackendPackage Cache
- Maintenance classes
- Cloud uploader
- Delayed Jobs
- Staging Workflow
- StatusHistory
- OBS API
- Owner Search
- Search
- Links
- Distributions
- Repository
- Data Migrations
- next_rails
- Ruby Update
- Rails Profiling
- Installing a local LDAP-server
- Remote Pairing Setup Guide
- Factory Dashboard
- osc
- Setup an OBS Development Environment on macOS
- Run OpenQA smoketest locally
- Responsive Guidelines
- Importing database dumps
- Problem Statement & Solution
- Kickoff New Stuff
- New Swagger API doc
- Documentation and Communication
- GitHub Actions
- How to Introduce Software Design Patterns
- Query Objects
- Services
- View Components
- RFC: Core Components
- RFC: Decorator Pattern
- RFC: Backend models